Skip to content

Ollama Guide

Ollama is a powerful tool for running large language models locally. This guide explains how to integrate Ollama with ByteBuddy to leverage local AI models for development assistance.

What is Ollama?

Ollama is a tool that makes it easy to run large language models locally on your machine. Benefits of using Ollama with ByteBuddy include:

  • Privacy: Keep your code and data local
  • No Internet Required: Work offline
  • Cost Effective: No API costs
  • Customizable: Run any compatible model
  • Fast: Low latency for local models

Installing Ollama

macOS

Install using Homebrew:

bash
brew install ollama

Or download from ollama.ai:

bash
curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download the Windows installer from ollama.ai and run the installer.

Linux

Install using curl:

bash
curl -fsSL https://ollama.ai/install.sh | sh

Or use package managers:

bash
# Ubuntu/Debian
sudo apt install ollama

# Fedora
sudo dnf install ollama

# CentOS/RHEL
sudo yum install ollama

Starting Ollama

Running the Service

Start Ollama service:

bash
# Start Ollama daemon
ollama serve

# Or run in background
ollama serve &

# Check if running
ollama list

System Startup

Enable Ollama to start automatically:

bash
# macOS (using Homebrew)
brew services start ollama

# Linux (systemd)
sudo systemctl enable ollama
sudo systemctl start ollama

Code-Focused Models

Download models optimized for coding:

bash
# CodeLlama (Meta's coding model)
ollama pull codellama:7b
ollama pull codellama:13b
ollama pull codellama:34b

# CodeLlama with Python instruction tuning
ollama pull codellama:7b-python

# Llama 3 (general purpose with good coding abilities)
ollama pull llama3:8b
ollama pull llama3:70b

# Mistral (efficient and capable)
ollama pull mistral:7b
ollama pull mixtral:8x7b

General Purpose Models

Other useful models:

bash
# Phi-3 (Microsoft's compact model)
ollama pull phi3:3.8b

# Gemma (Google's models)
ollama pull gemma:2b
ollama pull gemma:7b

# Neural Chat (Intel's model)
ollama pull neural-chat:7b

Configuring ByteBuddy with Ollama

Basic Configuration

Configure ByteBuddy to use Ollama models:

yaml
# .bytebuddy/config.yaml
models:
  - name: "local-coding"
    provider: "ollama"
    model: "codellama:7b"
    baseURL: "http://localhost:11434"
    role: "chat"

  - name: "local-fast"
    provider: "ollama"
    model: "mistral:7b"
    baseURL: "http://localhost:11434"
    role: "autocomplete"

Advanced Configuration

Fine-tune Ollama model settings:

yaml
models:
  - name: "local-coding-advanced"
    provider: "ollama"
    model: "codellama:13b"
    baseURL: "http://localhost:11434"
    role: "chat"

    # Model parameters
    temperature: 0.2 # Lower for more deterministic output
    maxTokens: 2048
    topP: 0.9
    frequencyPenalty: 0.1
    presencePenalty: 0.1

    # Ollama-specific options
    options:
      num_ctx: 4096 # Context window
      num_predict: 512 # Max tokens to predict
      repeat_last_n: 64 # How far back to look for repetition
      repeat_penalty: 1.1 # Penalty for repetition
      top_k: 40 # Limit to top K choices
      tfs_z: 1.0 # Tail free sampling
      mirostat: 0 # Mirostat sampling (0=disabled, 1=MLE, 2=Mirosample)
      mirostat_tau: 5.0 # Mirostat tau
      mirostat_eta: 0.1 # Mirostat eta

Multiple Model Setup

Configure multiple Ollama models for different tasks:

yaml
models:
  # Fast autocomplete model
  - name: "ollama-autocomplete"
    provider: "ollama"
    model: "mistral:7b"
    baseURL: "http://localhost:11434"
    role: "autocomplete"
    temperature: 0.1

  # Powerful coding model
  - name: "ollama-coding"
    provider: "ollama"
    model: "codellama:13b"
    baseURL: "http://localhost:11434"
    role: "chat"
    temperature: 0.3

  # General purpose model
  - name: "ollama-general"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"
    role: "chat"
    temperature: 0.7

Model Management

Listing Models

View downloaded models:

bash
# List all models
ollama list

# Show model details
ollama show codellama:7b

# Show model information in JSON
ollama list --json

Managing Models

Download, remove, and manage models:

bash
# Pull a model
ollama pull llama3:8b

# Remove a model
ollama rm codellama:7b

# Copy a model
ollama cp llama3:8b my-custom-llama:latest

# Create a model from Modelfile
ollama create my-model -f ./Modelfile

Model Information

Get detailed model information:

bash
# Show model parameters
ollama show llama3:8b --modelfile

# Show license information
ollama show llama3:8b --license

Performance Optimization

Hardware Acceleration

Enable GPU acceleration for better performance:

bash
# Check GPU support
ollama list

# Ollama automatically uses available GPUs
# For NVIDIA GPUs, ensure CUDA is installed
# For AMD GPUs, ensure ROCm is installed
# For Apple Silicon, Metal is automatically used

Resource Management

Control resource usage:

yaml
models:
  - name: "resource-managed-model"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"

    options:
      num_thread: 8 # Limit CPU threads
      num_gpu: 1 # Limit GPU layers
      num_keep: 4 # Number of tokens to keep between chunks

Context Management

Optimize context window usage:

yaml
models:
  - name: "context-optimized"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"

    options:
      num_ctx: 4096 # Context window size
      num_batch: 512 # Batch size for prompt processing

Custom Models

Creating Modelfiles

Create custom model configurations:

dockerfile
# Modelfile
FROM llama3:8b

# Set system message
SYSTEM """
You are a helpful AI coding assistant integrated with ByteBuddy.
Focus on providing accurate, secure, and efficient code solutions.
"""

# Add custom parameters
PARAMETER temperature 0.3
PARAMETER repeat_penalty 1.1

# Add template for better formatting
TEMPLATE """
<|begin_of_sentence|>{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|><|end_of_sentence|>
"""

Build the custom model:

bash
ollama create my-custom-model -f ./Modelfile

Model Customization

Customize models for specific tasks:

dockerfile
# CodingAssistant Modelfile
FROM codellama:7b

SYSTEM """
You are an expert coding assistant. Provide:
1. Working code examples
2. Explanations of code functionality
3. Best practices and security considerations
4. Error handling and edge cases
"""

PARAMETER temperature 0.2
PARAMETER repeat_penalty 1.2
PARAMETER top_k 50
PARAMETER top_p 0.9

TEMPLATE """
[INST] <<SYS>>
{{ .System }}
<</SYS>>

{{ .Prompt }} [/INST] {{ .Response }}
"""

Integration with Development Workflow

Development Environment Setup

Configure your development environment:

yaml
# .bytebuddy/config.yaml
models:
  - name: "dev-local"
    provider: "ollama"
    model: "codellama:7b"
    baseURL: "http://localhost:11434"
    role: "chat"

  - name: "dev-fast"
    provider: "ollama"
    model: "mistral:7b"
    baseURL: "http://localhost:11434"
    role: "autocomplete"

preferences:
  # Optimize for local models
  maxContextTokens: 4096
  requestTimeout: 60 # Local models may take longer

Project-Specific Configuration

Tailor models to specific projects:

yaml
# Project-specific .bytebuddy/config.yaml
models:
  - name: "python-project"
    provider: "ollama"
    model: "codellama:7b-python"
    baseURL: "http://localhost:11434"
    role: "chat"

  - name: "javascript-project"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"
    role: "chat"

Troubleshooting

Common Issues

Connection Problems

bash
# Check if Ollama is running
ps aux | grep ollama

# Check Ollama service status
sudo systemctl status ollama  # Linux
brew services list | grep ollama  # macOS

# Test connection
curl http://localhost:11434/api/tags

Model Loading Issues

bash
# Check available models
ollama list

# Re-pull problematic model
ollama pull codellama:7b --insecure

# Check model integrity
ollama show codellama:7b --modelfile

Performance Issues

bash
# Check system resources
htop  # or Activity Monitor on macOS

# Limit CPU usage
OLLAMA_NUM_THREAD=4 ollama serve

# Check GPU usage
nvidia-smi  # for NVIDIA GPUs

Debugging Commands

bash
# Enable verbose logging
OLLAMA_DEBUG=1 ollama serve

# Check Ollama logs
journalctl -u ollama -f  # Linux
tail -f /usr/local/var/log/ollama.log  # macOS

# Test model directly
ollama run llama3:8b "Hello, how are you?"

# Get model information
ollama show llama3:8b --all

Best Practices

Model Selection

  1. Match Model to Task: Use specialized models for specific tasks
  2. Consider Hardware: Choose models that fit your hardware
  3. Balance Speed vs Quality: Smaller models for fast tasks, larger for complex ones
  4. Test Multiple Models: Find what works best for your use case

Resource Management

  1. Monitor Usage: Keep track of CPU/GPU/memory usage
  2. Limit Concurrent Models: Don't run too many models simultaneously
  3. Clean Up Unused Models: Remove models you're not using
  4. Use Appropriate Context: Don't use larger context windows than needed

Security

  1. Keep Models Updated: Regularly update to latest versions
  2. Verify Sources: Only use trusted model sources
  3. Monitor Network: Ollama should only listen on localhost by default
  4. Secure Configuration: Protect your configuration files

Advanced Features

Model Serving

Serve models on custom ports:

bash
# Serve on custom port
OLLAMA_HOST=http://0.0.0.0:11435 ollama serve

# Configure ByteBuddy to use custom port
yaml
models:
  - name: "remote-ollama"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://192.168.1.100:11434" # Remote Ollama server

Model Chaining

Use multiple models in sequence:

yaml
rules:
  - name: "code-review-process"
    prompt: |
      First, analyze this code for security issues using a security-focused model.
      Then, review for performance optimizations using a performance-focused model.
      Finally, check for best practices using a general coding model.

Next Steps

After setting up Ollama with ByteBuddy, explore these related guides: