Azure OpenAI
Azure OpenAI Service provides powerful AI capabilities with access to GPT models through an OpenAI-compatible API.
Supported Models
- gpt-4o - Latest GPT-4 optimized version
- gpt-4 - Standard GPT-4
- gpt-4-32k - Long context version
- gpt-35-turbo - GPT-3.5 Turbo
- gpt-35-turbo-16k - Long context version
Configuration
Basic Configuration
Configure in config.yaml or ~/.bytebuddy/config.yaml:
yaml
models:
- name: "azure-gpt-4"
provider: "azure"
model: "gpt-4"
apiKey: "${AZURE_OPENAI_API_KEY}"
apiBase: "https://your-resource.openai.azure.com/"
roles: ["chat", "edit"]
env:
deploymentName: "gpt-4-deployment"
apiVersion: "2024-02-15-preview"
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4000Multi-Model Configuration
yaml
models:
- name: "azure-gpt-4"
provider: "azure"
model: "gpt-4"
apiKey: "${AZURE_OPENAI_API_KEY}"
apiBase: "https://your-resource.openai.azure.com/"
roles: ["chat", "edit"]
env:
deploymentName: "gpt4-deployment"
apiVersion: "2024-02-15-preview"
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4000
- name: "azure-gpt-35-turbo"
provider: "azure"
model: "gpt-35-turbo"
apiKey: "${AZURE_OPENAI_API_KEY}"
apiBase: "https://your-resource.openai.azure.com/"
roles: ["chat"]
env:
deploymentName: "gpt35-turbo-deployment"
apiVersion: "2024-02-15-preview"
defaultCompletionOptions:
temperature: 0.5
maxTokens: 4000Configuration Fields
Required Fields
- name: Unique identifier for the model configuration
- provider: Set to
"azure" - model: Model name (e.g.,
gpt-4,gpt-35-turbo) - apiKey: Azure OpenAI API key
- apiBase: Azure OpenAI resource endpoint
Environment Configuration (env)
Azure requires specific parameters via the env field:
- deploymentName: Deployment name (created in Azure Portal)
- apiVersion: API version (recommended:
2024-02-15-preview) - resourceName: Resource name (optional)
Optional Fields
- roles: Model roles [
chat,edit,apply,autocomplete] - defaultCompletionOptions:
temperature: Control randomness (0-2)maxTokens: Maximum tokenstopP: Nucleus sampling parameterfrequencyPenalty: Frequency penaltypresencePenalty: Presence penalty
- requestOptions:
timeout: Request timeout (milliseconds)verifySsl: Whether to verify SSL certificates
Environment Variables
bash
# ~/.bashrc or ~/.zshrc
export AZURE_OPENAI_API_KEY="your-azure-api-key"Getting API Key
- Log in to Azure Portal
- Create or locate your Azure OpenAI resource
- Get API key and endpoint from "Keys and Endpoint" section
- Create model deployment in "Deployments" section
- Note the deployment name for configuration
Use Case Configurations
Code Development
yaml
models:
- name: "code-assistant"
provider: "azure"
model: "gpt-4"
apiKey: "${AZURE_OPENAI_API_KEY}"
apiBase: "https://your-resource.openai.azure.com/"
roles: ["chat", "edit"]
env:
deploymentName: "gpt4-deployment"
apiVersion: "2024-02-15-preview"
defaultCompletionOptions:
temperature: 0.1
maxTokens: 2000Quick Chat
yaml
models:
- name: "quick-chat"
provider: "azure"
model: "gpt-35-turbo"
apiKey: "${AZURE_OPENAI_API_KEY}"
apiBase: "https://your-resource.openai.azure.com/"
roles: ["chat"]
env:
deploymentName: "gpt35-deployment"
apiVersion: "2024-02-15-preview"
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2000Troubleshooting
Common Errors
- 401 Unauthorized: Check if API key is correct
- 404 Not Found: Verify deployment name, resource endpoint, and API version
- 429 Too Many Requests: Rate limit reached, wait and retry
- DeploymentNotFound: Check if
deploymentNameis correct
Debugging Steps
- Verify API key and endpoint URL
- Confirm deployment name exists in Azure Portal
- Check if API version is supported
- Confirm quotas and rate limits
Best Practices
1. Security
- Use environment variables to store API keys
- Rotate API keys regularly
- Use Azure Key Vault to manage keys
2. Performance Optimization
- Choose appropriate model for the task
- Set reasonable
maxTokenslimits - Use streaming responses for better UX
3. Cost Control
- Monitor usage and costs
- Configure different deployments for different purposes
- Use GPT-3.5 Turbo for simple tasks
4. Configuration Management
- Use different deployments for different environments
- Configure appropriate rate limits
- Implement retry logic for transient errors