Together AI
Together AI provides hosting services for open-source models, focusing on providing developers with high-performance, cost-effective AI model access.
Supported Models
Meta LLaMA
- meta-llama/Llama-2-70b-chat-hf - LLaMA 2 70B
- meta-llama/Llama-2-13b-chat-hf - LLaMA 2 13B
- meta-llama/Llama-3-70b-instruct-hf - LLaMA 3 70B
- meta-llama/Llama-3-8b-instruct-hf - LLaMA 3 8B
Mistral
- mistralai/Mixtral-8x7B-Instruct-v0.1 - Mixtral MoE
- mistralai/Mistral-7B-Instruct-v0.1 - Mistral 7B
Code Models
- codellama/CodeLlama-34b-Instruct-hf - CodeLlama 34B
- codellama/CodeLlama-13b-Instruct-hf - CodeLlama 13B
- bigcode/starcoder - StarCoder
Other Open-Source Models
- WizardLM/WizardLM-70B-V1.0 - WizardLM
- togethercomputer/RedPajama-INCITE-7B-Instruct - RedPajama
Configuration
Basic Configuration
Configure in config.yaml or ~/.bytebuddy/config.yaml:
yaml
models:
- name: "together-llama"
provider: "together"
model: "meta-llama/Llama-2-70b-chat-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096Code Generation Configuration
yaml
models:
- name: "together-code"
provider: "together"
model: "codellama/CodeLlama-34b-Instruct-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["edit", "apply"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 2048Multi-Model Configuration
yaml
models:
- name: "together-llama-70b"
provider: "together"
model: "meta-llama/Llama-2-70b-chat-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096
- name: "together-mixtral"
provider: "together"
model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
apiKey: "${TOGETHER_API_KEY}"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.6
maxTokens: 4096
- name: "together-codellama"
provider: "together"
model: "codellama/CodeLlama-34b-Instruct-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["edit", "apply"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 2048Configuration Fields
Required Fields
- name: Unique identifier for the model configuration
- provider: Set to
"together" - model: Model identifier (format:
organization/model-name) - apiKey: Together API key
Optional Fields
- roles: Model roles [
chat,edit,apply,autocomplete] - defaultCompletionOptions:
temperature: Control randomness (0-2)maxTokens: Maximum tokenstopP: Nucleus sampling parametertopK: Sampling candidates countrepetitionPenalty: Repetition penaltystopSequences: Stop sequences
Environment Variables
bash
# ~/.bashrc or ~/.zshrc
export TOGETHER_API_KEY="your-together-api-key"Getting API Key
- Visit Together AI
- Register and log in to account
- Generate new key in API Keys page
- Save the key to environment variable
Use Case Configurations
High-Performance Chat
yaml
models:
- name: "high-quality-chat"
provider: "together"
model: "meta-llama/Llama-3-70b-instruct-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096Cost Optimization
yaml
models:
- name: "cost-optimized"
provider: "together"
model: "meta-llama/Llama-2-13b-chat-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2048Code Assistant
yaml
models:
- name: "code-assistant"
provider: "together"
model: "codellama/CodeLlama-34b-Instruct-hf"
apiKey: "${TOGETHER_API_KEY}"
roles: ["edit", "apply"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 2048Performance Features
High Throughput
- Supports massive concurrent requests
- Optimized inference engine
- Intelligent load balancing
Low Latency
- Globally distributed inference servers
- Intelligent routing selection
- Cache optimization
Scalability
- Dynamic resource allocation
- Auto-scaling
- Elastic computing resources
Troubleshooting
Common Errors
- 401 Unauthorized: Check if API key is correct
- 429 Too Many Requests: Rate limit reached
- Model Not Found: Confirm model identifier format
- Service Unavailable: Service temporarily unavailable
Debugging Steps
- Verify API key format and validity
- Check model identifier is correct
- Confirm network connection is normal
- View Together AI status page
- Check rate limits and quotas
Best Practices
1. Model Selection
- Choose model size based on task complexity
- Prioritize latest model versions
- Consider cost-effectiveness balance
- Use CodeLlama for code tasks
2. Parameter Optimization
- Lower temperature for consistency
- Set reasonable maxTokens limits
- Use stop sequences to control output
- Adjust repetition penalty to avoid repetition
3. Cost Control
- Monitor API usage
- Choose appropriate model size
- Optimize prompt length
- Set quota alerts
4. Performance Optimization
- Enable streaming responses
- Implement request caching
- Batch similar requests
- Choose nearest server region