Skip to content

Groq

Groq provides extremely fast AI inference speed, focusing on real-time response and low-latency applications.

Supported Models

Llama 3.1

  • llama-3.1-405b-reasoning - Reasoning-specialized model
  • llama-3.1-70b-versatile - Versatile model
  • llama-3.1-8b-instant - Fast response model

Llama 3

  • llama-3-70b-8192 - 70B model
  • llama-3-8b-8192 - 8B model

Mixtral

  • mixtral-8x7b-32768 - Mixtral MoE model

Other Models

  • gemma-7b-it - Google Gemma model

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml
models:
  - name: "groq-llama"
    provider: "groq"
    model: "llama-3.1-70b-versatile"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Real-Time Chat Configuration

yaml
models:
  - name: "groq-instant"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat", "autocomplete"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Multi-Model Configuration

yaml
models:
  - name: "groq-instant"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.5
      maxTokens: 1024

  - name: "groq-versatile"
    provider: "groq"
    model: "llama-3.1-70b-versatile"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

  - name: "groq-reasoning"
    provider: "groq"
    model: "llama-3.1-405b-reasoning"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 8192

Configuration Fields

Required Fields

  • name: Unique identifier for the model configuration
  • provider: Set to "groq"
  • model: Model name
  • apiKey: Groq API key

Optional Fields

  • roles: Model roles [chat, edit, apply, autocomplete]
  • defaultCompletionOptions:
    • temperature: Control randomness (0-2)
    • maxTokens: Maximum tokens
    • topP: Nucleus sampling parameter
    • frequencyPenalty: Frequency penalty
    • presencePenalty: Presence penalty
    • stopSequences: Stop sequences

Environment Variables

bash
# ~/.bashrc or ~/.zshrc
export GROQ_API_KEY="your-groq-api-key"

Getting API Key

  1. Visit Groq Console
  2. Register and log in to account
  3. Navigate to API Keys page
  4. Create new API key
  5. Save the key to environment variable

Use Case Configurations

Real-Time Chat

yaml
models:
  - name: "real-time-chat"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Code Generation

yaml
models:
  - name: "code-gen"
    provider: "groq"
    model: "llama-3.1-70b-versatile"
    apiKey: "${GROQ_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 4096

Complex Reasoning

yaml
models:
  - name: "reasoning"
    provider: "groq"
    model: "llama-3.1-405b-reasoning"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.1
      maxTokens: 16384

Code Completion

yaml
models:
  - name: "autocomplete"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 512

Speed Advantages

Groq's LPU (Language Processing Unit) technology provides industry-leading inference speed:

  • Real-Time Response: Millisecond-level latency
  • High Throughput: Supports massive concurrent requests
  • Stable Performance: Consistent speed performance
  • Low Cost: Fast inference reduces costs

Troubleshooting

Common Errors

  1. 401 Unauthorized: Check if API key is correct
  2. 429 Too Many Requests: Rate limit reached
  3. Model Not Available: Confirm model name is correct
  4. Context Length Exceeded: Reduce input or maxTokens

Debugging Steps

  1. Verify API key format and validity
  2. Check rate limits
  3. Confirm model name spelling
  4. View Groq status page
  5. Monitor usage quotas

Rate Limits

  • Free Tier: 30 requests per minute
  • Paid Tier: Higher rate limits based on subscription plan

Best Practices

1. Model Selection

  • Real-Time Applications: Use llama-3.1-8b-instant
  • Complex Tasks: Use llama-3.1-70b-versatile
  • Reasoning Tasks: Use llama-3.1-405b-reasoning
  • Code Completion: Use instant model with low temperature

2. Performance Optimization

  • Leverage Groq's speed advantage with streaming responses
  • Choose appropriate model size for different tasks
  • Set reasonable maxTokens limits
  • Implement effective caching strategies

3. Cost Control

  • Monitor API usage
  • Use smaller models for simple tasks
  • Set quota alerts
  • Optimize prompt length

4. Security

  • Use environment variables to store API keys
  • Rotate keys regularly
  • Monitor unusual usage

Use Cases

Groq is particularly suitable for:

  • Real-Time Chat Applications - Millisecond response times
  • Code Completion - Fast instant suggestions
  • Interactive Education - Smooth learning experience
  • Game AI - Real-time decision-making and dialogue
  • Customer Service Bots - Instant customer responses