Skip to content

Together AI

Together AI provides hosting services for open-source models, focusing on providing developers with high-performance, cost-effective AI model access.

Supported Models

Meta LLaMA

  • meta-llama/Llama-2-70b-chat-hf - LLaMA 2 70B
  • meta-llama/Llama-2-13b-chat-hf - LLaMA 2 13B
  • meta-llama/Llama-3-70b-instruct-hf - LLaMA 3 70B
  • meta-llama/Llama-3-8b-instruct-hf - LLaMA 3 8B

Mistral

  • mistralai/Mixtral-8x7B-Instruct-v0.1 - Mixtral MoE
  • mistralai/Mistral-7B-Instruct-v0.1 - Mistral 7B

Code Models

  • codellama/CodeLlama-34b-Instruct-hf - CodeLlama 34B
  • codellama/CodeLlama-13b-Instruct-hf - CodeLlama 13B
  • bigcode/starcoder - StarCoder

Other Open-Source Models

  • WizardLM/WizardLM-70B-V1.0 - WizardLM
  • togethercomputer/RedPajama-INCITE-7B-Instruct - RedPajama

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml
models:
  - name: "together-llama"
    provider: "together"
    model: "meta-llama/Llama-2-70b-chat-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Code Generation Configuration

yaml
models:
  - name: "together-code"
    provider: "together"
    model: "codellama/CodeLlama-34b-Instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 2048

Multi-Model Configuration

yaml
models:
  - name: "together-llama-70b"
    provider: "together"
    model: "meta-llama/Llama-2-70b-chat-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

  - name: "together-mixtral"
    provider: "together"
    model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.6
      maxTokens: 4096

  - name: "together-codellama"
    provider: "together"
    model: "codellama/CodeLlama-34b-Instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 2048

Configuration Fields

Required Fields

  • name: Unique identifier for the model configuration
  • provider: Set to "together"
  • model: Model identifier (format: organization/model-name)
  • apiKey: Together API key

Optional Fields

  • roles: Model roles [chat, edit, apply, autocomplete]
  • defaultCompletionOptions:
    • temperature: Control randomness (0-2)
    • maxTokens: Maximum tokens
    • topP: Nucleus sampling parameter
    • topK: Sampling candidates count
    • repetitionPenalty: Repetition penalty
    • stopSequences: Stop sequences

Environment Variables

bash
# ~/.bashrc or ~/.zshrc
export TOGETHER_API_KEY="your-together-api-key"

Getting API Key

  1. Visit Together AI
  2. Register and log in to account
  3. Generate new key in API Keys page
  4. Save the key to environment variable

Use Case Configurations

High-Performance Chat

yaml
models:
  - name: "high-quality-chat"
    provider: "together"
    model: "meta-llama/Llama-3-70b-instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Cost Optimization

yaml
models:
  - name: "cost-optimized"
    provider: "together"
    model: "meta-llama/Llama-2-13b-chat-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Code Assistant

yaml
models:
  - name: "code-assistant"
    provider: "together"
    model: "codellama/CodeLlama-34b-Instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 2048

Performance Features

High Throughput

  • Supports massive concurrent requests
  • Optimized inference engine
  • Intelligent load balancing

Low Latency

  • Globally distributed inference servers
  • Intelligent routing selection
  • Cache optimization

Scalability

  • Dynamic resource allocation
  • Auto-scaling
  • Elastic computing resources

Troubleshooting

Common Errors

  1. 401 Unauthorized: Check if API key is correct
  2. 429 Too Many Requests: Rate limit reached
  3. Model Not Found: Confirm model identifier format
  4. Service Unavailable: Service temporarily unavailable

Debugging Steps

  1. Verify API key format and validity
  2. Check model identifier is correct
  3. Confirm network connection is normal
  4. View Together AI status page
  5. Check rate limits and quotas

Best Practices

1. Model Selection

  • Choose model size based on task complexity
  • Prioritize latest model versions
  • Consider cost-effectiveness balance
  • Use CodeLlama for code tasks

2. Parameter Optimization

  • Lower temperature for consistency
  • Set reasonable maxTokens limits
  • Use stop sequences to control output
  • Adjust repetition penalty to avoid repetition

3. Cost Control

  • Monitor API usage
  • Choose appropriate model size
  • Optimize prompt length
  • Set quota alerts

4. Performance Optimization

  • Enable streaming responses
  • Implement request caching
  • Batch similar requests
  • Choose nearest server region