LLaMA Stack
LLaMA Stack is a complete solution based on LLaMA models, providing model services, management, and deployment functions.
Supported Models
- llama-stack-default - Default LLaMA model
- llama-stack-instruct - Instruction-tuned model
- llama-stack-chat - Chat-optimized model
- llama-stack-code - Code-specific model
Configuration
Basic Configuration
Configure in config.yaml or ~/.bytebuddy/config.yaml:
yaml
models:
- name: "llamastack-default"
provider: "llamastack"
model: "llama-stack-default"
apiKey: "${LLAMASTACK_API_KEY}"
apiBase: "http://localhost:5000"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096Instruction-Tuned Configuration
yaml
models:
- name: "llamastack-instruct"
provider: "llamastack"
model: "llama-stack-instruct"
apiKey: "${LLAMASTACK_API_KEY}"
apiBase: "http://localhost:5000"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.3
maxTokens: 2048Multi-Model Configuration
yaml
models:
- name: "llamastack-chat"
provider: "llamastack"
model: "llama-stack-chat"
apiKey: "${LLAMASTACK_API_KEY}"
apiBase: "http://localhost:5000"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096
- name: "llamastack-code"
provider: "llamastack"
model: "llama-stack-code"
apiKey: "${LLAMASTACK_API_KEY}"
apiBase: "http://localhost:5000"
roles: ["edit", "apply"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 2048Configuration Fields
Required Fields
- name: Unique identifier for the model configuration
- provider: Set to
"llamastack" - model: Model name
- apiKey: LLaMA Stack API key
- apiBase: LLaMA Stack server address
Optional Fields
- roles: Model roles [
chat,edit,apply,autocomplete] - defaultCompletionOptions:
temperature: Control randomness (0-1)maxTokens: Maximum tokenstopP: Nucleus sampling parametertopK: Sampling candidates countfrequencyPenalty: Frequency penaltypresencePenalty: Presence penalty
Environment Variables
bash
# ~/.bashrc or ~/.zshrc
export LLAMASTACK_API_KEY="your-llamastack-api-key"Getting API Key
- Visit LLaMA Stack service console
- Register and log in to account
- Navigate to API keys page
- Generate new API key
- Save the key to environment variable
Use Case Configurations
General Chat
yaml
models:
- name: "general-chat"
provider: "llamastack"
model: "llama-stack-chat"
apiKey: "${LLAMASTACK_API_KEY}"
apiBase: "http://localhost:5000"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2048Code Generation
yaml
models:
- name: "code-gen"
provider: "llamastack"
model: "llama-stack-code"
apiKey: "${LLAMASTACK_API_KEY}"
apiBase: "http://localhost:5000"
roles: ["edit", "apply"]
defaultCompletionOptions:
temperature: 0.2
maxTokens: 4096Instruction Tasks
yaml
models:
- name: "instruction"
provider: "llamastack"
model: "llama-stack-instruct"
apiKey: "${LLAMASTACK_API_KEY}"
apiBase: "http://localhost:5000"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.3
maxTokens: 2048Troubleshooting
Common Errors
- 401 Unauthorized: Check if API key is correct
- Connection Refused: Confirm server is running
- 503 Service Unavailable: Server overloaded or under maintenance
Debugging Steps
- Verify API key format and validity
- Check server address and port
- Confirm network connection is normal
- View server logs
Best Practices
1. Model Selection
- Use chat model for dialogue tasks
- Use code model for code tasks
- Use instruct model for instruction tasks
2. Performance Optimization
- Set reasonable timeout values
- Choose appropriate model for the task
- Implement request caching
3. Security
- Use environment variables to store API keys
- Rotate keys regularly
- Monitor unusual usage
4. Cost Control
- Monitor API usage
- Set maxTokens limits
- Choose appropriate models