Skip to content

DeepInfra

DeepInfra provides high-performance AI model inference services, focusing on fast response and cost-effectiveness.

Supported Models

Meta Llama

  • meta-llama/Llama-2-70b-chat-hf - Llama 2 70B chat model
  • meta-llama/Llama-2-13b-chat-hf - Llama 2 13B chat model
  • meta-llama/Llama-2-7b-chat-hf - Llama 2 7B chat model

Mistral

  • mistralai/Mixtral-8x7B-Instruct-v0.1 - Mixtral MoE model
  • mistralai/Mistral-7B-Instruct-v0.2 - Mistral 7B instruction model

Other Models

  • HuggingFaceH4/zephyr-7b-beta - Zephyr model
  • NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO - Nous Hermes model

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml
models:
  - name: "deepinfra-llama"
    provider: "deepinfra"
    model: "meta-llama/Llama-2-70b-chat-hf"
    apiKey: "${DEEPINFRA_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

High-Performance Configuration

yaml
models:
  - name: "deepinfra-mixtral"
    provider: "deepinfra"
    model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
    apiKey: "${DEEPINFRA_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.6
      maxTokens: 8192

Multi-Model Configuration

yaml
models:
  - name: "deepinfra-llama-70b"
    provider: "deepinfra"
    model: "meta-llama/Llama-2-70b-chat-hf"
    apiKey: "${DEEPINFRA_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

  - name: "deepinfra-mistral"
    provider: "deepinfra"
    model: "mistralai/Mistral-7B-Instruct-v0.2"
    apiKey: "${DEEPINFRA_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.5
      maxTokens: 4096

Configuration Fields

Required Fields

  • name: Unique identifier for the model configuration
  • provider: Set to "deepinfra"
  • model: Model identifier (format: organization/model-name)
  • apiKey: DeepInfra API key

Optional Fields

  • roles: Model roles [chat, edit, apply, autocomplete]
  • defaultCompletionOptions:
    • temperature: Control randomness (0-2)
    • maxTokens: Maximum tokens
    • topP: Nucleus sampling parameter
    • topK: Sampling candidates count
    • repetitionPenalty: Repetition penalty

Environment Variables

bash
# ~/.bashrc or ~/.zshrc
export DEEPINFRA_API_KEY="your-deepinfra-api-key"

Getting API Key

  1. Visit DeepInfra Website
  2. Register and log in to account
  3. Navigate to API keys page
  4. Generate new API key
  5. Save the key to environment variable

Use Case Configurations

General Chat

yaml
models:
  - name: "chat-bot"
    provider: "deepinfra"
    model: "meta-llama/Llama-2-13b-chat-hf"
    apiKey: "${DEEPINFRA_API_KEY}"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Code Generation

yaml
models:
  - name: "code-gen"
    provider: "deepinfra"
    model: "mistralai/Mixtral-8x7B-Instruct-v0.1"
    apiKey: "${DEEPINFRA_API_KEY}"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 4096

Fast Response

yaml
models:
  - name: "fast-response"
    provider: "deepinfra"
    model: "meta-llama/Llama-2-7b-chat-hf"
    apiKey: "${DEEPINFRA_API_KEY}"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.5
      maxTokens: 1024

Troubleshooting

Common Errors

  1. 401 Unauthorized: Check if API key is correct
  2. 429 Too Many Requests: Rate limit reached
  3. Model Not Available: Verify model identifier format
  4. Timeout Error: Network connection issue or model loading timeout

Debugging Steps

  1. Verify API key format and validity
  2. Check model identifier is correct
  3. Confirm network connection is normal
  4. View DeepInfra status page
  5. Check rate limits and quotas

Best Practices

1. Model Selection

  • Complex Tasks: Use 70B or Mixtral 8x7B models
  • General Chat: Use 13B models
  • Fast Response: Use 7B models

2. Performance Optimization

  • Choose appropriate model size balancing quality and speed
  • Set reasonable timeout values
  • Implement request caching

3. Cost Control

  • Monitor API usage
  • Select model size based on task complexity
  • Set maxTokens limits

4. Security

  • Use environment variables to store API keys
  • Rotate keys regularly
  • Monitor unusual usage