Google Vertex AI
Google Vertex AI is Google Cloud Platform's unified machine learning platform, providing enterprise-grade AI model services.
Supported Models
Gemini Series
- gemini-pro - General-purpose text model
- gemini-pro-vision - Multimodal model (supports images)
- gemini-1.5-pro - High-performance version
- gemini-1.5-flash - Fast response version
PaLM 2 Series
- text-bison - Text generation model
- chat-bison - Conversational model
- code-bison - Code generation model
Configuration
Basic Configuration
Configure in config.yaml or ~/.bytebuddy/config.yaml:
yaml
models:
- name: "vertex-gemini"
provider: "vertexai"
model: "gemini-pro"
roles: ["chat", "edit"]
env:
projectId: "your-gcp-project-id"
location: "us-central1"
defaultCompletionOptions:
temperature: 0.7
maxTokens: 8192Using Service Account
yaml
models:
- name: "vertex-sa"
provider: "vertexai"
model: "gemini-pro"
roles: ["chat"]
env:
projectId: "${GCP_PROJECT_ID}"
location: "us-central1"
credentials: "${GOOGLE_APPLICATION_CREDENTIALS}"Multi-Model Configuration
yaml
models:
- name: "vertex-gemini-pro"
provider: "vertexai"
model: "gemini-1.5-pro"
roles: ["chat", "edit"]
env:
projectId: "my-project"
location: "us-central1"
defaultCompletionOptions:
temperature: 0.7
maxTokens: 8192
- name: "vertex-gemini-flash"
provider: "vertexai"
model: "gemini-1.5-flash"
roles: ["autocomplete"]
env:
projectId: "my-project"
location: "us-central1"
defaultCompletionOptions:
temperature: 0.3
maxTokens: 2048Configuration Fields
Required Fields
- name: Unique identifier for the model configuration
- provider: Set to
"vertexai" - model: Model name
Environment Configuration (env)
- projectId: GCP project ID (required)
- location: GCP region (required)
- credentials: Service account credentials file path (optional)
Optional Fields
- roles: Model roles [
chat,edit,apply,autocomplete] - capabilities: Model capabilities (e.g.,
image_inputfor vision models) - defaultCompletionOptions:
temperature: Control randomness (0-1)maxTokens: Maximum tokenstopP: Nucleus sampling parametertopK: Sampling candidates count
Environment Variables
bash
# ~/.bashrc or ~/.zshrc
export GCP_PROJECT_ID="your-project-id"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"Setup Steps
1. Create GCP Project
- Visit Google Cloud Console
- Create new project or select existing project
- Note the project ID
2. Enable Vertex AI API
bash
# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com
# Or enable via web console
# Navigate to "APIs & Services" > "Enable APIs and Services"
# Search and enable "Vertex AI API"3. Configure Authentication
Option A: Use Service Account (Recommended for Production)
bash
# Create service account
gcloud iam service-accounts create vertex-ai-sa \
--display-name="Vertex AI Service Account"
# Grant permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:vertex-ai-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# Create and download key
gcloud iam service-accounts keys create ~/vertex-ai-key.json \
--iam-account=vertex-ai-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com
# Set environment variable
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/vertex-ai-key.json"Option B: Use Application Default Credentials (for Development)
bash
# Login
gcloud auth application-default login4. Verify Configuration
bash
# Test access
gcloud ai models list --region=us-central1Use Case Configurations
Code Generation
yaml
models:
- name: "code-assistant"
provider: "vertexai"
model: "gemini-1.5-pro"
roles: ["chat", "edit"]
env:
projectId: "${GCP_PROJECT_ID}"
location: "us-central1"
defaultCompletionOptions:
temperature: 0.2
maxTokens: 4096General Chat
yaml
models:
- name: "chat-bot"
provider: "vertexai"
model: "gemini-pro"
roles: ["chat"]
env:
projectId: "${GCP_PROJECT_ID}"
location: "us-central1"
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2048Image Understanding
yaml
models:
- name: "vision"
provider: "vertexai"
model: "gemini-pro-vision"
roles: ["chat"]
capabilities: ["image_input"]
env:
projectId: "${GCP_PROJECT_ID}"
location: "us-central1"Troubleshooting
Common Errors
- Permission Denied: Check service account permissions
- API Not Enabled: Enable Vertex AI API
- Invalid Project: Verify project ID
- Region Not Supported: Check model availability in the region
- Quota Exceeded: Request quota increase
Debugging Steps
- Verify service account credentials
- Check API enablement status
- Confirm project ID and region
- View Cloud Logging for details
- Monitor quota usage
Regional Availability
| Region | Location Code | Gemini Pro | Gemini 1.5 Pro |
|---|---|---|---|
| US Central | us-central1 | ✅ | ✅ |
| US East | us-east1 | ✅ | ✅ |
| Europe West | europe-west1 | ✅ | ✅ |
| Asia Northeast | asia-northeast1 | ✅ | ❌ |
Best Practices
1. Security
- Use service accounts for production
- Implement least privilege access control
- Rotate service account keys regularly
- Store credentials securely
- Enable audit logging
2. Performance Optimization
- Choose nearest region
- Use Flash model for faster responses
- Enable caching
- Set reasonable timeout values
- Implement request batching
3. Cost Control
- Monitor API usage and costs
- Use appropriate model for the task
- Set budget alerts
- Implement rate limiting
- Use quota management
4. Reliability
- Implement retry logic with exponential backoff
- Handle errors gracefully
- Monitor service health
- Use multiple regions for failover
- Log all errors and exceptions
Quotas and Limits
Default Quotas
- Requests per minute: Varies by model
- Tokens per minute: Varies by model
- Concurrent requests: 100
Request Quota Increase
- Visit Quotas page
- Filter by "Vertex AI API"
- Select quota to increase
- Click "EDIT QUOTAS"
- Submit request
Cost Optimization
Pricing Tiers
| Model | Input (per 1K tokens) | Output (per 1K tokens) |
|---|---|---|
| Gemini 1.5 Pro | $0.00125 | $0.00375 |
| Gemini 1.5 Flash | $0.000075 | $0.00030 |
| Gemini Pro | $0.000125 | $0.000375 |
Optimization Tips
- Use Flash model for simple tasks
- Set appropriate maxTokens limits
- Enable response caching
- Batch similar requests
- Monitor and analyze usage patterns