Groq

Groq provides extremely fast AI inference speed, focusing on real-time response and low-latency applications.

Supported Models

Llama 3.1

llama-3.1-405b-reasoning - Reasoning-specialized model
llama-3.1-70b-versatile - Versatile model
llama-3.1-8b-instant - Fast response model

Llama 3

llama-3-70b-8192 - 70B model
llama-3-8b-8192 - 8B model

Mixtral

mixtral-8x7b-32768 - Mixtral MoE model

Other Models

gemma-7b-it - Google Gemma model

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml

models:
  - name: "groq-llama"
    provider: "groq"
    model: "llama-3.1-70b-versatile"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Real-Time Chat Configuration

yaml

models:
  - name: "groq-instant"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat", "autocomplete"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Multi-Model Configuration

yaml

models:
  - name: "groq-instant"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.5
      maxTokens: 1024

  - name: "groq-versatile"
    provider: "groq"
    model: "llama-3.1-70b-versatile"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

  - name: "groq-reasoning"
    provider: "groq"
    model: "llama-3.1-405b-reasoning"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 8192

Configuration Fields

Required Fields

name: Unique identifier for the model configuration
provider: Set to "groq"
model: Model name
apiKey: Groq API key

Optional Fields

roles: Model roles [chat, edit, apply, autocomplete]
defaultCompletionOptions:
- temperature: Control randomness (0-2)
- maxTokens: Maximum tokens
- topP: Nucleus sampling parameter
- frequencyPenalty: Frequency penalty
- presencePenalty: Presence penalty
- stopSequences: Stop sequences

Environment Variables

bash

# ~/.bashrc or ~/.zshrc
export GROQ_API_KEY="your-groq-api-key"

Getting API Key

Visit Groq Console
Register and log in to account
Navigate to API Keys page
Create new API key
Save the key to environment variable

Use Case Configurations

Real-Time Chat

yaml

models:
  - name: "real-time-chat"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Code Generation

yaml

models:
  - name: "code-gen"
    provider: "groq"
    model: "llama-3.1-70b-versatile"
    apiKey: "${GROQ_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 4096

Complex Reasoning

yaml

models:
  - name: "reasoning"
    provider: "groq"
    model: "llama-3.1-405b-reasoning"
    apiKey: "${GROQ_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.1
      maxTokens: 16384

Code Completion

yaml

models:
  - name: "autocomplete"
    provider: "groq"
    model: "llama-3.1-8b-instant"
    apiKey: "${GROQ_API_KEY}"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 512

Speed Advantages

Groq's LPU (Language Processing Unit) technology provides industry-leading inference speed:

Real-Time Response: Millisecond-level latency
High Throughput: Supports massive concurrent requests
Stable Performance: Consistent speed performance
Low Cost: Fast inference reduces costs

Troubleshooting

Common Errors

401 Unauthorized: Check if API key is correct
429 Too Many Requests: Rate limit reached
Model Not Available: Confirm model name is correct
Context Length Exceeded: Reduce input or maxTokens

Debugging Steps

Verify API key format and validity
Check rate limits
Confirm model name spelling
View Groq status page
Monitor usage quotas

Rate Limits

Free Tier: 30 requests per minute
Paid Tier: Higher rate limits based on subscription plan

Best Practices

1. Model Selection

Real-Time Applications: Use llama-3.1-8b-instant
Complex Tasks: Use llama-3.1-70b-versatile
Reasoning Tasks: Use llama-3.1-405b-reasoning
Code Completion: Use instant model with low temperature

2. Performance Optimization

Leverage Groq's speed advantage with streaming responses
Choose appropriate model size for different tasks
Set reasonable maxTokens limits
Implement effective caching strategies

3. Cost Control

Monitor API usage
Use smaller models for simple tasks
Set quota alerts
Optimize prompt length

4. Security

Use environment variables to store API keys
Rotate keys regularly
Monitor unusual usage

Use Cases

Groq is particularly suitable for:

Real-Time Chat Applications - Millisecond response times
Code Completion - Fast instant suggestions
Interactive Education - Smooth learning experience
Game AI - Real-time decision-making and dialogue
Customer Service Bots - Instant customer responses

Popular Providers

More Providers

Groq

Supported Models

Llama 3.1

Llama 3

Mixtral

Other Models

Configuration

Basic Configuration

Real-Time Chat Configuration

Multi-Model Configuration

Configuration Fields

Required Fields

Optional Fields

Environment Variables

Getting API Key

Use Case Configurations

Real-Time Chat

Code Generation

Complex Reasoning

Code Completion

Speed Advantages

Troubleshooting

Common Errors

Debugging Steps

Rate Limits

Best Practices

1. Model Selection

2. Performance Optimization

3. Cost Control

4. Security

Use Cases

Groq ​

Supported Models ​

Llama 3.1 ​

Llama 3 ​

Mixtral ​

Other Models ​

Configuration ​

Basic Configuration ​

Real-Time Chat Configuration ​

Multi-Model Configuration ​

Configuration Fields ​

Required Fields ​

Optional Fields ​

Environment Variables ​

Getting API Key ​

Use Case Configurations ​

Real-Time Chat ​

Code Generation ​

Complex Reasoning ​

Code Completion ​

Speed Advantages ​

Troubleshooting ​

Common Errors ​

Debugging Steps ​

Rate Limits ​

Best Practices ​

1. Model Selection ​

2. Performance Optimization ​

3. Cost Control ​

4. Security ​

Use Cases ​

Groq

Supported Models

Llama 3.1

Llama 3

Mixtral

Other Models

Configuration

Basic Configuration

Real-Time Chat Configuration

Multi-Model Configuration

Configuration Fields

Required Fields

Optional Fields

Environment Variables

Getting API Key

Use Case Configurations

Real-Time Chat

Code Generation

Complex Reasoning

Code Completion

Speed Advantages

Troubleshooting

Common Errors

Debugging Steps

Rate Limits

Best Practices

1. Model Selection

2. Performance Optimization

3. Cost Control

4. Security

Use Cases