Ollama

Ollama is a local model runner that allows you to run various open-source large language models locally, ensuring data privacy and offline usage.

Supported Models

Meta LLaMA Series

llama3 - LLaMA 3 series models
llama3:8b - LLaMA 3 8B parameter version
llama3:70b - LLaMA 3 70B parameter version
llama2 - LLaMA 2 series models

Other Popular Models

codellama - Code-specialized model
mistral - Mistral AI's open-source model
qwen - Alibaba Qwen (Tongyi Qianwen)
gemma - Google's open-source model
phi3 - Microsoft's Phi-3 model

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml

models:
  - name: "local-llama3"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2000

Multi-Model Configuration

yaml

models:
  - name: "llama3-chat"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2000

  - name: "codellama-edit"
    provider: "ollama"
    model: "codellama"
    apiBase: "http://localhost:11434"
    roles: ["edit"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 1500

  - name: "qwen-autocomplete"
    provider: "ollama"
    model: "qwen:7b"
    apiBase: "http://localhost:11434"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.1
      maxTokens: 500

Custom Endpoint Configuration

yaml

models:
  - name: "remote-ollama"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://192.168.1.100:11434"
    roles: ["chat"]
    requestOptions:
      timeout: 120000
      verifySsl: false

Advanced Configuration

With Authentication

yaml

models:
  - name: "secure-ollama"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    requestOptions:
      timeout: 60000
      headers:
        "Authorization": "Bearer ${OLLAMA_TOKEN}"

Complete Configuration Example

yaml

models:
  - name: "ollama-complete"
    provider: "ollama"
    model: "llama3:8b"
    apiBase: "http://localhost:11434"
    roles: ["chat", "edit", "apply"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4000
      topP: 0.9
      stream: true
    requestOptions:
      timeout: 120000
      verifySsl: true

Use Case Configurations

Local Development

yaml

models:
  - name: "local-dev"
    provider: "ollama"
    model: "codellama:7b"
    apiBase: "http://localhost:11434"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 2000

Privacy Protection

yaml

models:
  - name: "private-chat"
    provider: "ollama"
    model: "llama3"
    apiBase: "http://localhost:11434"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 1000
    requestOptions:
      timeout: 60000

Fast Response

yaml

models:
  - name: "fast-local"
    provider: "ollama"
    model: "phi3:mini"
    apiBase: "http://localhost:11434"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.1
      maxTokens: 200
    requestOptions:
      timeout: 10000

Installation and Setup

1. Install Ollama

bash

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Download from https://ollama.com/download

2. Start Ollama Service

bash

# Start service
ollama serve

# Or run in background
nohup ollama serve > ollama.log 2>&1 &

3. Download Models

bash

# Download LLaMA 3
ollama pull llama3

# Download code model
ollama pull codellama

# Download lightweight model
ollama pull phi3:mini

# List downloaded models
ollama list

4. Test Models

bash

# Test conversation
ollama run llama3 "Hello, how are you?"

# Interactive chat
ollama run llama3

Advanced Usage

Custom Models

bash

# Create custom model
ollama create mymodel -f Modelfile

# Modelfile example
FROM llama3
SYSTEM """You are a helpful AI assistant."""
PARAMETER temperature 0.7
PARAMETER top_p 0.9

GPU Acceleration

Ensure system has proper GPU drivers:

bash

# NVIDIA GPU
# Install CUDA drivers
# Ollama will automatically detect and use GPU

# Apple Silicon (M1/M2/M3)
# Ollama will automatically use Metal Performance Shaders

Remote Access

bash

# Allow remote access
OLLAMA_HOST=0.0.0.0 ollama serve

# Configure firewall rules
sudo ufw allow 11434

Troubleshooting

Q: Ollama connection failed?

A: Check the following:

Ollama service is running: ps aux | grep ollama
Port 11434 is available: netstat -an | grep 11434
Firewall settings are correct

Q: Model download is slow?

Use mirror sources or proxy
Choose smaller model versions
Consider using pre-downloaded model files

Q: Out of memory?

Choose smaller models (7B instead of 70B)
Close other memory-intensive programs
Consider increasing system memory

Q: How to update models?

bash

# Update model to latest version
ollama pull llama3:latest

# View available versions
ollama show llama3

Q: Response time too long?

Use smaller models
Enable GPU acceleration
Reduce maxTokens setting
Use quantized versions (e.g., q4 version)

Best Practices

1. Model Selection

Development/Testing: Use smaller 7B models
Production: Choose size based on hardware
Privacy Sensitive: Always use local models
Speed Priority: Use quantized versions

2. Hardware Optimization

GPU: Use CUDA or Metal acceleration
Memory: At least 16GB RAM for 7B models
Storage: Use SSD for faster loading

3. Security Considerations

Local deployment ensures data privacy
Regularly update Ollama and models
Limit network access (localhost only if needed)

4. Performance Tuning

Preload commonly used models
Use appropriate temperature parameters
Enable streaming responses

Model Recommendations

Use Case	Recommended Model	VRAM Required	Features
General Chat	llama3:8b	~8GB	Balanced performance
Code Generation	codellama:7b	~8GB	Code-specialized
Lightweight Tasks	phi3:mini	~4GB	Fast response
High-Quality Chat	llama3:70b	~40GB	High quality
Chinese Optimized	qwen:7b	~8GB	Good Chinese support

Popular Providers

More Providers

Ollama

Supported Models

Meta LLaMA Series

Other Popular Models

Configuration

Basic Configuration

Multi-Model Configuration

Custom Endpoint Configuration

Advanced Configuration

With Authentication

Complete Configuration Example

Use Case Configurations

Local Development

Privacy Protection

Fast Response

Installation and Setup

1. Install Ollama

2. Start Ollama Service

3. Download Models

4. Test Models

Advanced Usage

Custom Models

GPU Acceleration

Remote Access

Troubleshooting

Q: Ollama connection failed?

Q: Model download is slow?

Q: Out of memory?

Q: How to update models?

Q: Response time too long?

Best Practices

1. Model Selection

2. Hardware Optimization

3. Security Considerations

4. Performance Tuning

Model Recommendations

Ollama ​

Supported Models ​

Meta LLaMA Series ​

Other Popular Models ​

Configuration ​

Basic Configuration ​

Multi-Model Configuration ​

Custom Endpoint Configuration ​

Advanced Configuration ​

With Authentication ​

Complete Configuration Example ​

Use Case Configurations ​

Local Development ​

Privacy Protection ​

Fast Response ​

Installation and Setup ​

1. Install Ollama ​

2. Start Ollama Service ​

3. Download Models ​

4. Test Models ​

Advanced Usage ​

Custom Models ​

GPU Acceleration ​

Remote Access ​

Troubleshooting ​

Q: Ollama connection failed? ​

Q: Model download is slow? ​

Q: Out of memory? ​

Q: How to update models? ​

Q: Response time too long? ​

Best Practices ​

1. Model Selection ​

2. Hardware Optimization ​

3. Security Considerations ​

4. Performance Tuning ​

Model Recommendations ​

Ollama

Supported Models

Meta LLaMA Series

Other Popular Models

Configuration

Basic Configuration

Multi-Model Configuration

Custom Endpoint Configuration

Advanced Configuration

With Authentication

Complete Configuration Example

Use Case Configurations

Local Development

Privacy Protection

Fast Response

Installation and Setup

1. Install Ollama

2. Start Ollama Service

3. Download Models

4. Test Models

Advanced Usage

Custom Models

GPU Acceleration

Remote Access

Troubleshooting

Q: Ollama connection failed?

Q: Model download is slow?

Q: Out of memory?

Q: How to update models?

Q: Response time too long?

Best Practices

1. Model Selection

2. Hardware Optimization

3. Security Considerations

4. Performance Tuning

Model Recommendations