Groq
Groq provides extremely fast AI inference speed, focusing on real-time response and low-latency applications.
Supported Models
Llama 3.1
- llama-3.1-405b-reasoning - Reasoning-specialized model
- llama-3.1-70b-versatile - Versatile model
- llama-3.1-8b-instant - Fast response model
Llama 3
- llama-3-70b-8192 - 70B model
- llama-3-8b-8192 - 8B model
Mixtral
- mixtral-8x7b-32768 - Mixtral MoE model
Other Models
- gemma-7b-it - Google Gemma model
Configuration
Basic Configuration
Configure in config.yaml or ~/.bytebuddy/config.yaml:
yaml
models:
- name: "groq-llama"
provider: "groq"
model: "llama-3.1-70b-versatile"
apiKey: "${GROQ_API_KEY}"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096Real-Time Chat Configuration
yaml
models:
- name: "groq-instant"
provider: "groq"
model: "llama-3.1-8b-instant"
apiKey: "${GROQ_API_KEY}"
roles: ["chat", "autocomplete"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2048Multi-Model Configuration
yaml
models:
- name: "groq-instant"
provider: "groq"
model: "llama-3.1-8b-instant"
apiKey: "${GROQ_API_KEY}"
roles: ["autocomplete"]
defaultCompletionOptions:
temperature: 0.5
maxTokens: 1024
- name: "groq-versatile"
provider: "groq"
model: "llama-3.1-70b-versatile"
apiKey: "${GROQ_API_KEY}"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096
- name: "groq-reasoning"
provider: "groq"
model: "llama-3.1-405b-reasoning"
apiKey: "${GROQ_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.3
maxTokens: 8192Configuration Fields
Required Fields
- name: Unique identifier for the model configuration
- provider: Set to
"groq" - model: Model name
- apiKey: Groq API key
Optional Fields
- roles: Model roles [
chat,edit,apply,autocomplete] - defaultCompletionOptions:
temperature: Control randomness (0-2)maxTokens: Maximum tokenstopP: Nucleus sampling parameterfrequencyPenalty: Frequency penaltypresencePenalty: Presence penaltystopSequences: Stop sequences
Environment Variables
bash
# ~/.bashrc or ~/.zshrc
export GROQ_API_KEY="your-groq-api-key"Getting API Key
- Visit Groq Console
- Register and log in to account
- Navigate to API Keys page
- Create new API key
- Save the key to environment variable
Use Case Configurations
Real-Time Chat
yaml
models:
- name: "real-time-chat"
provider: "groq"
model: "llama-3.1-8b-instant"
apiKey: "${GROQ_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2048Code Generation
yaml
models:
- name: "code-gen"
provider: "groq"
model: "llama-3.1-70b-versatile"
apiKey: "${GROQ_API_KEY}"
roles: ["edit", "apply"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 4096Complex Reasoning
yaml
models:
- name: "reasoning"
provider: "groq"
model: "llama-3.1-405b-reasoning"
apiKey: "${GROQ_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.1
maxTokens: 16384Code Completion
yaml
models:
- name: "autocomplete"
provider: "groq"
model: "llama-3.1-8b-instant"
apiKey: "${GROQ_API_KEY}"
roles: ["autocomplete"]
defaultCompletionOptions:
temperature: 0.3
maxTokens: 512Speed Advantages
Groq's LPU (Language Processing Unit) technology provides industry-leading inference speed:
- Real-Time Response: Millisecond-level latency
- High Throughput: Supports massive concurrent requests
- Stable Performance: Consistent speed performance
- Low Cost: Fast inference reduces costs
Troubleshooting
Common Errors
- 401 Unauthorized: Check if API key is correct
- 429 Too Many Requests: Rate limit reached
- Model Not Available: Confirm model name is correct
- Context Length Exceeded: Reduce input or maxTokens
Debugging Steps
- Verify API key format and validity
- Check rate limits
- Confirm model name spelling
- View Groq status page
- Monitor usage quotas
Rate Limits
- Free Tier: 30 requests per minute
- Paid Tier: Higher rate limits based on subscription plan
Best Practices
1. Model Selection
- Real-Time Applications: Use
llama-3.1-8b-instant - Complex Tasks: Use
llama-3.1-70b-versatile - Reasoning Tasks: Use
llama-3.1-405b-reasoning - Code Completion: Use instant model with low temperature
2. Performance Optimization
- Leverage Groq's speed advantage with streaming responses
- Choose appropriate model size for different tasks
- Set reasonable maxTokens limits
- Implement effective caching strategies
3. Cost Control
- Monitor API usage
- Use smaller models for simple tasks
- Set quota alerts
- Optimize prompt length
4. Security
- Use environment variables to store API keys
- Rotate keys regularly
- Monitor unusual usage
Use Cases
Groq is particularly suitable for:
- Real-Time Chat Applications - Millisecond response times
- Code Completion - Fast instant suggestions
- Interactive Education - Smooth learning experience
- Game AI - Real-time decision-making and dialogue
- Customer Service Bots - Instant customer responses