Model Configuration
ByteBuddy supports multiple AI models, allowing you to choose the most suitable model according to your needs.
Supported Model Types
Large Language Models
- OpenAI Models: GPT-3.5, GPT-4 series
- Anthropic Models: Claude series models
- Open Source Models: Local models running through Ollama, LM Studio, etc.
- Other Providers: Support for OpenAI-compatible APIs
Model Roles
ByteBuddy supports configuring different models for different roles:
- chat: For conversation interaction and complex task processing
- edit: For code editing tasks
- apply: For code application operations
- autocomplete: For real-time code completion
- embed: For embedding vector generation
- rerank: For search result reranking
Basic Configuration
Configuration File Structure
Configure models in the project root directory or ~/.bytebuddy/config.yaml:
# config.yaml
name: My ByteBuddy Config
version: 0.0.1
schema: v1
models:
- name: "gpt-4"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
roles:
- chat
- edit
- apply
- name: "claude-3-sonnet"
provider: "anthropic"
model: "claude-3-sonnet"
apiKey: "${ANTHROPIC_API_KEY}"
roles:
- chat
- autocomplete
- name: "local-llama"
provider: "ollama"
model: "llama2"
apiBase: "http://localhost:11434"
roles:
- chatBasic Configuration Parameters
Each model supports the following basic parameters:
- name: Name of the model configuration
- provider: Model provider (openai, anthropic, ollama, etc.)
- model: Specific model name
- apiKey: API key (can use environment variables)
- apiBase: API base URL (optional)
- roles: List of roles the model plays
Advanced Configuration Options
Completion Options Configuration
models:
- name: "gpt-4-turbo"
provider: "openai"
model: "gpt-4-turbo"
apiKey: "${OPENAI_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2000
topP: 0.9
presencePenalty: 0.1
frequencyPenalty: 0.1
stop: ["\n\n", "###"]Important: maxTokens Configuration
⚠️ CRITICAL WARNING: Always configure maxTokens to the model's maximum supported value!
When configuring maxTokens, you should set it to the maximum value supported by the model to avoid output truncation:
models:
- name: "gpt-4"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 8192 # GPT-4 maximum context length
topP: 0.9Why this is important:
Avoid Truncation: If
maxTokensis set too low, the model's output will be cut off prematurely, potentially leaving responses incomplete or code snippets unfinished.Model Capabilities: Different models have different maximum token limits:
- GPT-4: 8,192 tokens
- GPT-4 Turbo: 128,000 tokens
- Claude 3 Opus: 200,000 tokens
- Claude 3 Sonnet: 200,000 tokens
- Gemini Pro: 32,768 tokens
- Local models: Varies by model size and configuration
Task Requirements: Different tasks require different token budgets:
- Brief responses: 1,000-2,000 tokens
- Code generation: 4,000-8,000 tokens
- Documentation: 8,000-16,000 tokens
- Complex analysis: 16,000+ tokens
Best Practice: Always check the model's documentation for its maximum context length and set maxTokens accordingly. It's better to set a higher limit and let the model decide when to stop than to truncate potentially important content.
Example configurations for common models:
# OpenAI Models
- name: "gpt-4-turbo"
provider: "openai"
model: "gpt-4-turbo"
defaultCompletionOptions:
maxTokens: 128000 # Maximum for GPT-4 Turbo
- name: "gpt-4"
provider: "openai"
model: "gpt-4"
defaultCompletionOptions:
maxTokens: 8192 # Maximum for GPT-4
# Anthropic Models
- name: "claude-3-opus"
provider: "anthropic"
model: "claude-3-opus"
defaultCompletionOptions:
maxTokens: 200000 # Maximum for Claude 3 models
# Google Models
- name: "gemini-pro"
provider: "google"
model: "gemini-pro"
defaultCompletionOptions:
maxTokens: 32768 # Maximum for Gemini ProAutocomplete Options
models:
- name: "claude-autocomplete"
provider: "anthropic"
model: "claude-3-haiku"
apiKey: "${ANTHROPIC_API_KEY}"
roles: ["autocomplete"]
autocompleteOptions:
maxPromptTokens: 2000
debounceDelay: 300
modelTimeout: 10000
useCache: true
useImports: true
useRecentlyEdited: trueNote for autocomplete: For autocomplete roles, maxTokens should typically be set lower (e.g., 128-256 tokens) since autocomplete suggestions are usually brief. However, for other roles like chat, edit, and apply, always use the model's maximum supported value.
Request Options
models:
- name: "configured-model"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
roles: ["chat"]
requestOptions:
timeout: 30000
verifySsl: true
headers:
"User-Agent": "ByteBuddy/1.0"
extraBodyProperties:
custom_field: "value"Environment Variable Configuration
For security reasons, it is recommended to use environment variables to store API keys:
# Environment variable setup
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"Use in configuration file:
models:
- name: "secure-model"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"Configuration Examples for Different Providers
OpenAI Models
models:
- name: "gpt-4"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
apiBase: "https://api.openai.com/v1"
roles: ["chat", "edit"]Anthropic Models
models:
- name: "claude-3-opus"
provider: "anthropic"
model: "claude-3-opus-20240229"
apiKey: "${ANTHROPIC_API_KEY}"
roles: ["chat", "edit"]Local Ollama Models
models:
- name: "local-llama3"
provider: "ollama"
model: "llama3:8b"
apiBase: "http://localhost:11434"
roles: ["chat"]Custom OpenAI-Compatible API
models:
- name: "custom-provider"
provider: "openai-compatible"
model: "custom-model"
apiKey: "${CUSTOM_API_KEY}"
apiBase: "https://your-custom-api.com/v1"
roles: ["chat"]Using Company Internal Models
If your company has deployed an internal model service that complies with the OpenAI API specification, you can use it directly with ByteBuddy—even if the model isn't in the official supported provider list!
Simply set provider to "openai", then configure your company's internal model name and API address:
models:
- name: "company-internal-model"
provider: "openai" # Key: use openai provider
model: "your-company-model-name" # Your company's internal model name
apiKey: "${COMPANY_API_KEY}" # Company internal API key
apiBase: "https://your-company-ai-api.com/v1" # Company internal API address
roles: ["chat", "edit", "autocomplete"] # Assign roles as neededConfiguration Key Points:
providermust be set to"openai": This ensures ByteBuddy uses the standard OpenAI API call formatmodelfield should contain your company's internal model name: such as"qwen-max","glm-4","internlm2", etc.apiBaseshould point to your company's internal API gateway: Make sure the URL includes the/v1path (if your company's API follows the OpenAI standard)apiKeyshould use your company-assigned key: Also recommended to manage through environment variables
This approach allows you to seamlessly integrate your company's internal AI model services, enjoying the same experience as official OpenAI models while keeping your data secure within your organization.
Model Capability Configuration
ByteBuddy supports specifying specific capabilities for models:
models:
- name: "gpt-4-vision"
provider: "openai"
model: "gpt-4-vision-preview"
apiKey: "${OPENAI_API_KEY}"
roles: ["chat"]
capabilities:
- "tool_use"
- "image_input"Supported model capabilities:
- tool_use: Support for tool calling
- image_input: Support for image input
- next_edit: Support for next edit mode
Cache Configuration
models:
- name: "cached-model"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
cacheBehavior:
cacheSystemMessage: true
cacheConversation: trueTroubleshooting
Common Issue Resolution
API Connection Failure
yaml# Increase timeout requestOptions: timeout: 60000Slow Model Response
yaml# Use faster model - name: "fast-model" provider: "openai" model: "gpt-3.5-turbo" # ... other configurationsAPI Key Issues
- Ensure environment variables are set correctly
- Check API key permissions
- Verify the model is in the available list
Best Practices
Security
- Always use environment variables to store API keys
- Rotate API keys regularly
- Limit API key permission scope
Performance Optimization
- Choose appropriate models for different tasks
- Reasonably set temperature and maxTokens
- Enable caching to reduce duplicate requests
Cost Control
- Monitor token usage
- Use smaller models for simple tasks
- Set reasonable context length limits
Reliability
- Configure multiple models as backups
- Set appropriate timeout times
- Handle API rate limiting errors