Together AI

Together AI provides hosting services for open-source models, focusing on providing developers with high-performance, cost-effective AI model access.

Supported Models

Meta LLaMA

meta-llama/Llama-2-70b-chat-hf - LLaMA 2 70B
meta-llama/Llama-2-13b-chat-hf - LLaMA 2 13B
meta-llama/Llama-3-70b-instruct-hf - LLaMA 3 70B
meta-llama/Llama-3-8b-instruct-hf - LLaMA 3 8B

Mistral

mistralai/Mixtral-8x7B-Instruct-v0.1 - Mixtral MoE
mistralai/Mistral-7B-Instruct-v0.1 - Mistral 7B

Code Models

codellama/CodeLlama-34b-Instruct-hf - CodeLlama 34B
codellama/CodeLlama-13b-Instruct-hf - CodeLlama 13B
bigcode/starcoder - StarCoder

Other Open-Source Models

WizardLM/WizardLM-70B-V1.0 - WizardLM
togethercomputer/RedPajama-INCITE-7B-Instruct - RedPajama

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml

models:
  - name: "together-llama"
    provider: "together"
    model: "meta-llama/Llama-2-70b-chat-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Code Generation Configuration

yaml

models:
  - name: "together-code"
    provider: "together"
    model: "codellama/CodeLlama-34b-Instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 2048

Multi-Model Configuration

yaml

models:
  - name: "together-llama-70b"
    provider: "together"
    model: "meta-llama/Llama-2-70b-chat-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

  - name: "together-mixtral"
    provider: "together"
    model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.6
      maxTokens: 4096

  - name: "together-codellama"
    provider: "together"
    model: "codellama/CodeLlama-34b-Instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 2048

Configuration Fields

Required Fields

name: Unique identifier for the model configuration
provider: Set to "together"
model: Model identifier (format: organization/model-name)
apiKey: Together API key

Optional Fields

roles: Model roles [chat, edit, apply, autocomplete]
defaultCompletionOptions:
- temperature: Control randomness (0-2)
- maxTokens: Maximum tokens
- topP: Nucleus sampling parameter
- topK: Sampling candidates count
- repetitionPenalty: Repetition penalty
- stopSequences: Stop sequences

Environment Variables

bash

# ~/.bashrc or ~/.zshrc
export TOGETHER_API_KEY="your-together-api-key"

Getting API Key

Visit Together AI
Register and log in to account
Generate new key in API Keys page
Save the key to environment variable

Use Case Configurations

High-Performance Chat

yaml

models:
  - name: "high-quality-chat"
    provider: "together"
    model: "meta-llama/Llama-3-70b-instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Cost Optimization

yaml

models:
  - name: "cost-optimized"
    provider: "together"
    model: "meta-llama/Llama-2-13b-chat-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Code Assistant

yaml

models:
  - name: "code-assistant"
    provider: "together"
    model: "codellama/CodeLlama-34b-Instruct-hf"
    apiKey: "${TOGETHER_API_KEY}"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 2048

Performance Features

High Throughput

Supports massive concurrent requests
Optimized inference engine
Intelligent load balancing

Low Latency

Globally distributed inference servers
Intelligent routing selection
Cache optimization

Scalability

Dynamic resource allocation
Auto-scaling
Elastic computing resources

Troubleshooting

Common Errors

401 Unauthorized: Check if API key is correct
429 Too Many Requests: Rate limit reached
Model Not Found: Confirm model identifier format
Service Unavailable: Service temporarily unavailable

Debugging Steps

Verify API key format and validity
Check model identifier is correct
Confirm network connection is normal
View Together AI status page
Check rate limits and quotas

Best Practices

1. Model Selection

Choose model size based on task complexity
Prioritize latest model versions
Consider cost-effectiveness balance
Use CodeLlama for code tasks

2. Parameter Optimization

Lower temperature for consistency
Set reasonable maxTokens limits
Use stop sequences to control output
Adjust repetition penalty to avoid repetition

3. Cost Control

Monitor API usage
Choose appropriate model size
Optimize prompt length
Set quota alerts

4. Performance Optimization

Enable streaming responses
Implement request caching
Batch similar requests
Choose nearest server region

Popular Providers

More Providers

Together AI

Supported Models

Meta LLaMA

Mistral

Code Models

Other Open-Source Models

Configuration

Basic Configuration

Code Generation Configuration

Multi-Model Configuration

Configuration Fields

Required Fields

Optional Fields

Environment Variables

Getting API Key

Use Case Configurations

High-Performance Chat

Cost Optimization

Code Assistant

Performance Features

High Throughput

Low Latency

Scalability

Troubleshooting

Common Errors

Debugging Steps

Best Practices

1. Model Selection

2. Parameter Optimization

3. Cost Control

4. Performance Optimization

Together AI ​

Supported Models ​

Meta LLaMA ​

Mistral ​

Code Models ​

Other Open-Source Models ​

Configuration ​

Basic Configuration ​

Code Generation Configuration ​

Multi-Model Configuration ​

Configuration Fields ​

Required Fields ​

Optional Fields ​

Environment Variables ​

Getting API Key ​

Use Case Configurations ​

High-Performance Chat ​

Cost Optimization ​

Code Assistant ​

Performance Features ​

High Throughput ​

Low Latency ​

Scalability ​

Troubleshooting ​

Common Errors ​

Debugging Steps ​

Best Practices ​

1. Model Selection ​

2. Parameter Optimization ​

3. Cost Control ​

4. Performance Optimization ​

Together AI

Supported Models

Meta LLaMA

Mistral

Code Models

Other Open-Source Models

Configuration

Basic Configuration

Code Generation Configuration

Multi-Model Configuration

Configuration Fields

Required Fields

Optional Fields

Environment Variables

Getting API Key

Use Case Configurations

High-Performance Chat

Cost Optimization

Code Assistant

Performance Features

High Throughput

Low Latency

Scalability

Troubleshooting

Common Errors

Debugging Steps

Best Practices

1. Model Selection

2. Parameter Optimization

3. Cost Control

4. Performance Optimization