Model Roles Introduction
ByteBuddy's model role system allows you to select the most suitable model configuration for different task types, optimizing performance and cost-effectiveness.
What are Model Roles?
Model roles are pre-configured model settings for specific task types, including:
- Model selection
- Parameter tuning
- Context management
- Performance optimization
Supported Role Types
💬 Chat
For conversation and interactive tasks:
- Natural language understanding
- Contextual dialogue
- Multi-turn conversations
- User support
🔧 Autocomplete
For code and text completion:
- Real-time code suggestions
- Smart text completion
- Context-aware completion
- Fast response
✏️ Edit
For text and code editing:
- Content refactoring
- Format adjustment
- Syntax correction
- Style optimization
🎯 Apply
For specific application scenarios:
- Code generation
- Document creation
- Data processing
- Automated tasks
🔍 Embeddings
For vectorization and semantic search:
- Text vectorization
- Semantic similarity
- Retrieval augmentation
- Clustering analysis
📊 Reranking
For result sorting and filtering:
- Relevance ranking
- Quality assessment
- Priority adjustment
- Precise matching
Role Configuration Structure
Basic Configuration Example
Configure model roles in config.yaml:
models:
- name: "chat-assistant"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 8192 # ⚠️ IMPORTANT: Use model's maximum supported value
topP: 0.9
- name: "code-editor"
provider: "anthropic"
model: "claude-3-sonnet"
apiKey: "${ANTHROPIC_API_KEY}"
roles: ["edit", "apply"]
defaultCompletionOptions:
temperature: 0.3
maxTokens: 200000 # ⚠️ IMPORTANT: Use model's maximum supported value
- name: "autocomplete-engine"
provider: "together"
model: "codellama/CodeLlama-13b-Instruct-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["autocomplete"]
defaultCompletionOptions:
temperature: 0.1
maxTokens: 256 # Autocomplete can use smaller values⚠️ CRITICAL CONFIGURATION NOTE:
When configuring maxTokens for model roles, always set it to the model's maximum supported value (except for autocomplete roles). Setting maxTokens too low will cause output truncation, potentially leaving responses incomplete or code snippets unfinished.
Why this matters:
- Chat roles: Need sufficient tokens for detailed explanations and conversations
- Edit roles: Require enough tokens for complete code refactoring and documentation
- Apply roles: Need maximum tokens for complex task execution and code generation
- Autocomplete roles: Can use smaller values (128-256 tokens) as suggestions are brief
Model-specific maximums:
- GPT-4: 8,192 tokens
- GPT-4 Turbo: 128,000 tokens
- Claude 3 models: 200,000 tokens
- Gemini Pro: 32,768 tokens
- Local models: Check model documentation
Always check the model provider's documentation for the exact maximum context length and configure maxTokens accordingly.
Multi-Role Configuration
A single model can serve multiple roles:
models:
- name: "versatile-model"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
roles: ["chat", "edit", "apply"]
defaultCompletionOptions:
temperature: 0.5
maxTokens: 4096Role-Specific Configuration
Chat Role Configuration
models:
- name: "chat-model"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2000
topP: 0.9Edit Role Configuration
models:
- name: "edit-model"
provider: "anthropic"
model: "claude-3-sonnet"
apiKey: "${ANTHROPIC_API_KEY}"
roles: ["edit"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 4000Autocomplete Role Configuration
models:
- name: "autocomplete-model"
provider: "together"
model: "codellama/CodeLlama-13b-Instruct-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["autocomplete"]
defaultCompletionOptions:
temperature: 0.1
maxTokens: 256Apply Role Configuration
models:
- name: "apply-model"
provider: "openai"
model: "gpt-4"
apiKey: "${OPENAI_API_KEY}"
roles: ["apply"]
defaultCompletionOptions:
temperature: 0.3
maxTokens: 4096Embed Role Configuration
models:
- name: "embed-model"
provider: "openai"
model: "text-embedding-3-large"
apiKey: "${OPENAI_API_KEY}"
roles: ["embed"]Rerank Role Configuration
models:
- name: "rerank-model"
provider: "cohere"
model: "rerank-english-v3.0"
apiKey: "${COHERE_API_KEY}"
roles: ["rerank"]Best Practices
1. Role Design Principles
- Single Responsibility: Each role focuses on specific tasks
- Clear Boundaries: Define clear applicable scenarios
- Performance Optimization: Balance quality and efficiency
2. Configuration Management
- Version Control: Track role configuration changes
- Test Validation: Ensure role behavior meets expectations
- Documentation: Keep configuration documentation updated
3. Usage Guidelines
- Start Small: Begin with predefined roles
- Gradual Customization: Adjust based on actual needs
- Continuous Optimization: Improve roles based on feedback
Next Steps
- Learn About Roles: Review specific role type documentation
- Configure Models: Set up model roles in
config.yaml - Test and Adjust: Continuously improve based on usage
- Share Experience: Share quality configurations with team