Skip to content

Ollama

Ollama is a local model runner that allows you to run various open-source large language models locally, ensuring data privacy and offline usage.

Supported Models

Meta LLaMA Series

  • llama3 - LLaMA 3 series models
  • llama3:8b - LLaMA 3 8B parameter version
  • llama3:70b - LLaMA 3 70B parameter version
  • llama2 - LLaMA 2 series models
  • codellama - Code-specialized model
  • mistral - Mistral AI's open-source model
  • qwen - Alibaba Qwen (Tongyi Qianwen)
  • gemma - Google's open-source model
  • phi3 - Microsoft's Phi-3 model

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml
models:
  - name: "local-llama3"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2000

Multi-Model Configuration

yaml
models:
  - name: "llama3-chat"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2000

  - name: "codellama-edit"
    provider: "ollama"
    model: "codellama"
    apiBase: "http://localhost:11434"
    roles: ["edit"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 1500

  - name: "qwen-autocomplete"
    provider: "ollama"
    model: "qwen:7b"
    apiBase: "http://localhost:11434"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.1
      maxTokens: 500

Custom Endpoint Configuration

yaml
models:
  - name: "remote-ollama"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://192.168.1.100:11434"
    roles: ["chat"]
    requestOptions:
      timeout: 120000
      verifySsl: false

Advanced Configuration

With Authentication

yaml
models:
  - name: "secure-ollama"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    requestOptions:
      timeout: 60000
      headers:
        "Authorization": "Bearer ${OLLAMA_TOKEN}"

Complete Configuration Example

yaml
models:
  - name: "ollama-complete"
    provider: "ollama"
    model: "llama3:8b"
    apiBase: "http://localhost:11434"
    roles: ["chat", "edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4000
      topP: 0.9
      stream: true
    requestOptions:
      timeout: 120000
      verifySsl: true

Use Case Configurations

Local Development

yaml
models:
  - name: "local-dev"
    provider: "ollama"
    model: "codellama:7b"
    apiBase: "http://localhost:11434"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 2000

Privacy Protection

yaml
models:
  - name: "private-chat"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 1000
    requestOptions:
      timeout: 60000

Fast Response

yaml
models:
  - name: "fast-local"
    provider: "ollama"
    model: "phi3:mini"
    apiBase: "http://localhost:11434"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.1
      maxTokens: 200
    requestOptions:
      timeout: 10000

Installation and Setup

1. Install Ollama

bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com/download

2. Start Ollama Service

bash
# Start service
ollama serve

# Or run in background
nohup ollama serve > ollama.log 2>&1 &

3. Download Models

bash
# Download LLaMA 3
ollama pull llama3

# Download code model
ollama pull codellama

# Download lightweight model
ollama pull phi3:mini

# List downloaded models
ollama list

4. Test Models

bash
# Test conversation
ollama run llama3 "Hello, how are you?"

# Interactive chat
ollama run llama3

Advanced Usage

Custom Models

bash
# Create custom model
ollama create mymodel -f Modelfile

# Modelfile example
FROM llama3
SYSTEM """You are a helpful AI assistant."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9

GPU Acceleration

Ensure system has proper GPU drivers:

bash
# NVIDIA GPU
# Install CUDA drivers
# Ollama will automatically detect and use GPU

# Apple Silicon (M1/M2/M3)
# Ollama will automatically use Metal Performance Shaders

Remote Access

bash
# Allow remote access
OLLAMA_HOST=0.0.0.0 ollama serve

# Configure firewall rules
sudo ufw allow 11434

Troubleshooting

Q: Ollama connection failed?

A: Check the following:

  • Ollama service is running: ps aux | grep ollama
  • Port 11434 is available: netstat -an | grep 11434
  • Firewall settings are correct

Q: Model download is slow?

A:

  • Use mirror sources or proxy
  • Choose smaller model versions
  • Consider using pre-downloaded model files

Q: Out of memory?

A:

  • Choose smaller models (7B instead of 70B)
  • Close other memory-intensive programs
  • Consider increasing system memory

Q: How to update models?

A:

bash
# Update model to latest version
ollama pull llama3:latest

# View available versions
ollama show llama3

Q: Response time too long?

A:

  • Use smaller models
  • Enable GPU acceleration
  • Reduce maxTokens setting
  • Use quantized versions (e.g., q4 version)

Best Practices

1. Model Selection

  • Development/Testing: Use smaller 7B models
  • Production: Choose size based on hardware
  • Privacy Sensitive: Always use local models
  • Speed Priority: Use quantized versions

2. Hardware Optimization

  • GPU: Use CUDA or Metal acceleration
  • Memory: At least 16GB RAM for 7B models
  • Storage: Use SSD for faster loading

3. Security Considerations

  • Local deployment ensures data privacy
  • Regularly update Ollama and models
  • Limit network access (localhost only if needed)

4. Performance Tuning

  • Preload commonly used models
  • Use appropriate temperature parameters
  • Enable streaming responses

Model Recommendations

Use CaseRecommended ModelVRAM RequiredFeatures
General Chatllama3:8b~8GBBalanced performance
Code Generationcodellama:7b~8GBCode-specialized
Lightweight Tasksphi3:mini~4GBFast response
High-Quality Chatllama3:70b~40GBHigh quality
Chinese Optimizedqwen:7b~8GBGood Chinese support