Skip to content

Google Vertex AI

Google Vertex AI is Google Cloud Platform's unified machine learning platform, providing enterprise-grade AI model services.

Supported Models

Gemini Series

  • gemini-pro - General-purpose text model
  • gemini-pro-vision - Multimodal model (supports images)
  • gemini-1.5-pro - High-performance version
  • gemini-1.5-flash - Fast response version

PaLM 2 Series

  • text-bison - Text generation model
  • chat-bison - Conversational model
  • code-bison - Code generation model

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml
models:
  - name: "vertex-gemini"
    provider: "vertexai"
    model: "gemini-pro"
    roles: ["chat", "edit"]
    env:
      projectId: "your-gcp-project-id"
      location: "us-central1"
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 8192

Using Service Account

yaml
models:
  - name: "vertex-sa"
    provider: "vertexai"
    model: "gemini-pro"
    roles: ["chat"]
    env:
      projectId: "${GCP_PROJECT_ID}"
      location: "us-central1"
      credentials: "${GOOGLE_APPLICATION_CREDENTIALS}"

Multi-Model Configuration

yaml
models:
  - name: "vertex-gemini-pro"
    provider: "vertexai"
    model: "gemini-1.5-pro"
    roles: ["chat", "edit"]
    env:
      projectId: "my-project"
      location: "us-central1"
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 8192

  - name: "vertex-gemini-flash"
    provider: "vertexai"
    model: "gemini-1.5-flash"
    roles: ["autocomplete"]
    env:
      projectId: "my-project"
      location: "us-central1"
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 2048

Configuration Fields

Required Fields

  • name: Unique identifier for the model configuration
  • provider: Set to "vertexai"
  • model: Model name

Environment Configuration (env)

  • projectId: GCP project ID (required)
  • location: GCP region (required)
  • credentials: Service account credentials file path (optional)

Optional Fields

  • roles: Model roles [chat, edit, apply, autocomplete]
  • capabilities: Model capabilities (e.g., image_input for vision models)
  • defaultCompletionOptions:
    • temperature: Control randomness (0-1)
    • maxTokens: Maximum tokens
    • topP: Nucleus sampling parameter
    • topK: Sampling candidates count

Environment Variables

bash
# ~/.bashrc or ~/.zshrc
export GCP_PROJECT_ID="your-project-id"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

Setup Steps

1. Create GCP Project

  1. Visit Google Cloud Console
  2. Create new project or select existing project
  3. Note the project ID

2. Enable Vertex AI API

bash
# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com

# Or enable via web console
# Navigate to "APIs & Services" > "Enable APIs and Services"
# Search and enable "Vertex AI API"

3. Configure Authentication

Option A: Use Service Account (Recommended for Production)

bash
# Create service account
gcloud iam service-accounts create vertex-ai-sa \
  --display-name="Vertex AI Service Account"

# Grant permissions
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
  --member="serviceAccount:vertex-ai-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

# Create and download key
gcloud iam service-accounts keys create ~/vertex-ai-key.json \
  --iam-account=vertex-ai-sa@YOUR_PROJECT_ID.iam.gserviceaccount.com

# Set environment variable
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/vertex-ai-key.json"

Option B: Use Application Default Credentials (for Development)

bash
# Login
gcloud auth application-default login

4. Verify Configuration

bash
# Test access
gcloud ai models list --region=us-central1

Use Case Configurations

Code Generation

yaml
models:
  - name: "code-assistant"
    provider: "vertexai"
    model: "gemini-1.5-pro"
    roles: ["chat", "edit"]
    env:
      projectId: "${GCP_PROJECT_ID}"
      location: "us-central1"
    defaultCompletionOptions:
      temperature: 0.2
      maxTokens: 4096

General Chat

yaml
models:
  - name: "chat-bot"
    provider: "vertexai"
    model: "gemini-pro"
    roles: ["chat"]
    env:
      projectId: "${GCP_PROJECT_ID}"
      location: "us-central1"
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

Image Understanding

yaml
models:
  - name: "vision"
    provider: "vertexai"
    model: "gemini-pro-vision"
    roles: ["chat"]
    capabilities: ["image_input"]
    env:
      projectId: "${GCP_PROJECT_ID}"
      location: "us-central1"

Troubleshooting

Common Errors

  1. Permission Denied: Check service account permissions
  2. API Not Enabled: Enable Vertex AI API
  3. Invalid Project: Verify project ID
  4. Region Not Supported: Check model availability in the region
  5. Quota Exceeded: Request quota increase

Debugging Steps

  1. Verify service account credentials
  2. Check API enablement status
  3. Confirm project ID and region
  4. View Cloud Logging for details
  5. Monitor quota usage

Regional Availability

RegionLocation CodeGemini ProGemini 1.5 Pro
US Centralus-central1
US Eastus-east1
Europe Westeurope-west1
Asia Northeastasia-northeast1

Best Practices

1. Security

  • Use service accounts for production
  • Implement least privilege access control
  • Rotate service account keys regularly
  • Store credentials securely
  • Enable audit logging

2. Performance Optimization

  • Choose nearest region
  • Use Flash model for faster responses
  • Enable caching
  • Set reasonable timeout values
  • Implement request batching

3. Cost Control

  • Monitor API usage and costs
  • Use appropriate model for the task
  • Set budget alerts
  • Implement rate limiting
  • Use quota management

4. Reliability

  • Implement retry logic with exponential backoff
  • Handle errors gracefully
  • Monitor service health
  • Use multiple regions for failover
  • Log all errors and exceptions

Quotas and Limits

Default Quotas

  • Requests per minute: Varies by model
  • Tokens per minute: Varies by model
  • Concurrent requests: 100

Request Quota Increase

  1. Visit Quotas page
  2. Filter by "Vertex AI API"
  3. Select quota to increase
  4. Click "EDIT QUOTAS"
  5. Submit request

Cost Optimization

Pricing Tiers

ModelInput (per 1K tokens)Output (per 1K tokens)
Gemini 1.5 Pro$0.00125$0.00375
Gemini 1.5 Flash$0.000075$0.00030
Gemini Pro$0.000125$0.000375

Optimization Tips

  1. Use Flash model for simple tasks
  2. Set appropriate maxTokens limits
  3. Enable response caching
  4. Batch similar requests
  5. Monitor and analyze usage patterns