DeepInfra
DeepInfra provides high-performance AI model inference services, focusing on fast response and cost-effectiveness.
Supported Models
Meta Llama
- meta-llama/Llama-2-70b-chat-hf - Llama 2 70B chat model
- meta-llama/Llama-2-13b-chat-hf - Llama 2 13B chat model
- meta-llama/Llama-2-7b-chat-hf - Llama 2 7B chat model
Mistral
- mistralai/Mixtral-8x7B-Instruct-v0.1 - Mixtral MoE model
- mistralai/Mistral-7B-Instruct-v0.2 - Mistral 7B instruction model
Other Models
- HuggingFaceH4/zephyr-7b-beta - Zephyr model
- NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO - Nous Hermes model
Configuration
Basic Configuration
Configure in config.yaml or ~/.bytebuddy/config.yaml:
yaml
models:
- name: "deepinfra-llama"
provider: "deepinfra"
model: "meta-llama/Llama-2-70b-chat-hf"
apiKey: "${DEEPINFRA_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096High-Performance Configuration
yaml
models:
- name: "deepinfra-mixtral"
provider: "deepinfra"
model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
apiKey: "${DEEPINFRA_API_KEY}"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.6
maxTokens: 8192Multi-Model Configuration
yaml
models:
- name: "deepinfra-llama-70b"
provider: "deepinfra"
model: "meta-llama/Llama-2-70b-chat-hf"
apiKey: "${DEEPINFRA_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096
- name: "deepinfra-mistral"
provider: "deepinfra"
model: "mistralai/Mistral-7B-Instruct-v0.2"
apiKey: "${DEEPINFRA_API_KEY}"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.5
maxTokens: 4096Configuration Fields
Required Fields
- name: Unique identifier for the model configuration
- provider: Set to
"deepinfra" - model: Model identifier (format:
organization/model-name) - apiKey: DeepInfra API key
Optional Fields
- roles: Model roles [
chat,edit,apply,autocomplete] - defaultCompletionOptions:
temperature: Control randomness (0-2)maxTokens: Maximum tokenstopP: Nucleus sampling parametertopK: Sampling candidates countrepetitionPenalty: Repetition penalty
Environment Variables
bash
# ~/.bashrc or ~/.zshrc
export DEEPINFRA_API_KEY="your-deepinfra-api-key"Getting API Key
- Visit DeepInfra Website
- Register and log in to account
- Navigate to API keys page
- Generate new API key
- Save the key to environment variable
Use Case Configurations
General Chat
yaml
models:
- name: "chat-bot"
provider: "deepinfra"
model: "meta-llama/Llama-2-13b-chat-hf"
apiKey: "${DEEPINFRA_API_KEY}"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2048Code Generation
yaml
models:
- name: "code-gen"
provider: "deepinfra"
model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
apiKey: "${DEEPINFRA_API_KEY}"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 4096Fast Response
yaml
models:
- name: "fast-response"
provider: "deepinfra"
model: "meta-llama/Llama-2-7b-chat-hf"
apiKey: "${DEEPINFRA_API_KEY}"
roles: ["autocomplete"]
defaultCompletionOptions:
temperature: 0.5
maxTokens: 1024Troubleshooting
Common Errors
- 401 Unauthorized: Check if API key is correct
- 429 Too Many Requests: Rate limit reached
- Model Not Available: Verify model identifier format
- Timeout Error: Network connection issue or model loading timeout
Debugging Steps
- Verify API key format and validity
- Check model identifier is correct
- Confirm network connection is normal
- View DeepInfra status page
- Check rate limits and quotas
Best Practices
1. Model Selection
- Complex Tasks: Use 70B or Mixtral 8x7B models
- General Chat: Use 13B models
- Fast Response: Use 7B models
2. Performance Optimization
- Choose appropriate model size balancing quality and speed
- Set reasonable timeout values
- Implement request caching
3. Cost Control
- Monitor API usage
- Select model size based on task complexity
- Set maxTokens limits
4. Security
- Use environment variables to store API keys
- Rotate keys regularly
- Monitor unusual usage