Skip to content

LLaMA Stack

LLaMA Stack is a complete solution based on LLaMA models, providing model services, management, and deployment functions.

Supported Models

  • llama-stack-default - Default LLaMA model
  • llama-stack-instruct - Instruction-tuned model
  • llama-stack-chat - Chat-optimized model
  • llama-stack-code - Code-specific model

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml
models:
  - name: "llamastack-default"
    provider: "llamastack"
    model: "llama-stack-default"
    apiKey: "${LLAMASTACK_API_KEY}"
    apiBase: "http://localhost:5000"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Instruction-Tuned Configuration

yaml
models:
  - name: "llamastack-instruct"
    provider: "llamastack"
    model: "llama-stack-instruct"
    apiKey: "${LLAMASTACK_API_KEY}"
    apiBase: "http://localhost:5000"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 2048

Multi-Model Configuration

yaml
models:
  - name: "llamastack-chat"
    provider: "llamastack"
    model: "llama-stack-chat"
    apiKey: "${LLAMASTACK_API_KEY}"
    apiBase: "http://localhost:5000"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

  - name: "llamastack-code"
    provider: "llamastack"
    model: "llama-stack-code"
    apiKey: "${LLAMASTACK_API_KEY}"
    apiBase: "http://localhost:5000"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 2048

Configuration Fields

Required Fields

  • name: Unique identifier for the model configuration
  • provider: Set to "llamastack"
  • model: Model name
  • apiKey: LLaMA Stack API key
  • apiBase: LLaMA Stack server address

Optional Fields

  • roles: Model roles [chat, edit, apply, autocomplete]
  • defaultCompletionOptions:
    • temperature: Control randomness (0-1)
    • maxTokens: Maximum tokens
    • topP: Nucleus sampling parameter
    • topK: Sampling candidates count
    • frequencyPenalty: Frequency penalty
    • presencePenalty: Presence penalty

Environment Variables

bash
# ~/.bashrc or ~/.zshrc
export LLAMASTACK_API_KEY="your-llamastack-api-key"

Getting API Key

  1. Visit LLaMA Stack service console
  2. Register and log in to account
  3. Navigate to API keys page
  4. Generate new API key
  5. Save the key to environment variable

Use Case Configurations

General Chat

yaml
models:
  - name: "general-chat"
    provider: "llamastack"
    model: "llama-stack-chat"
    apiKey: "${LLAMASTACK_API_KEY}"
    apiBase: "http://localhost:5000"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Code Generation

yaml
models:
  - name: "code-gen"
    provider: "llamastack"
    model: "llama-stack-code"
    apiKey: "${LLAMASTACK_API_KEY}"
    apiBase: "http://localhost:5000"
    roles: ["edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 4096

Instruction Tasks

yaml
models:
  - name: "instruction"
    provider: "llamastack"
    model: "llama-stack-instruct"
    apiKey: "${LLAMASTACK_API_KEY}"
    apiBase: "http://localhost:5000"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 2048

Troubleshooting

Common Errors

  1. 401 Unauthorized: Check if API key is correct
  2. Connection Refused: Confirm server is running
  3. 503 Service Unavailable: Server overloaded or under maintenance

Debugging Steps

  1. Verify API key format and validity
  2. Check server address and port
  3. Confirm network connection is normal
  4. View server logs

Best Practices

1. Model Selection

  • Use chat model for dialogue tasks
  • Use code model for code tasks
  • Use instruct model for instruction tasks

2. Performance Optimization

  • Set reasonable timeout values
  • Choose appropriate model for the task
  • Implement request caching

3. Security

  • Use environment variables to store API keys
  • Rotate keys regularly
  • Monitor unusual usage

4. Cost Control

  • Monitor API usage
  • Set maxTokens limits
  • Choose appropriate models