Ollama Guide

Ollama is a powerful tool for running large language models locally. This guide explains how to integrate Ollama with ByteBuddy to leverage local AI models for development assistance.

What is Ollama?

Ollama is a tool that makes it easy to run large language models locally on your machine. Benefits of using Ollama with ByteBuddy include:

Privacy: Keep your code and data local
No Internet Required: Work offline
Cost Effective: No API costs
Customizable: Run any compatible model
Fast: Low latency for local models

Installing Ollama

macOS

Install using Homebrew:

bash

brew install ollama

Or download from ollama.ai:

bash

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Download the Windows installer from ollama.ai and run the installer.

Linux

Install using curl:

bash

curl -fsSL https://ollama.ai/install.sh | sh

Or use package managers:

bash

# Ubuntu/Debian
sudo apt install ollama

# Fedora
sudo dnf install ollama

# CentOS/RHEL
sudo yum install ollama

Starting Ollama

Running the Service

Start Ollama service:

bash

# Start Ollama daemon
ollama serve

# Or run in background
ollama serve &

# Check if running
ollama list

System Startup

Enable Ollama to start automatically:

bash

# macOS (using Homebrew)
brew services start ollama

# Linux (systemd)
sudo systemctl enable ollama
sudo systemctl start ollama

Popular Models

Code-Focused Models

Download models optimized for coding:

bash

# CodeLlama (Meta's coding model)
ollama pull codellama:7b
ollama pull codellama:13b
ollama pull codellama:34b

# CodeLlama with Python instruction tuning
ollama pull codellama:7b-python

# Llama 3 (general purpose with good coding abilities)
ollama pull llama3:8b
ollama pull llama3:70b

# Mistral (efficient and capable)
ollama pull mistral:7b
ollama pull mixtral:8x7b

General Purpose Models

Other useful models:

bash

# Phi-3 (Microsoft's compact model)
ollama pull phi3:3.8b

# Gemma (Google's models)
ollama pull gemma:2b
ollama pull gemma:7b

# Neural Chat (Intel's model)
ollama pull neural-chat:7b

Configuring ByteBuddy with Ollama

Basic Configuration

Configure ByteBuddy to use Ollama models:

yaml

# .bytebuddy/config.yaml
models:
  - name: "local-coding"
    provider: "ollama"
    model: "codellama:7b"
    baseURL: "http://localhost:11434"
    role: "chat"

  - name: "local-fast"
    provider: "ollama"
    model: "mistral:7b"
    baseURL: "http://localhost:11434"
    role: "autocomplete"

Advanced Configuration

Fine-tune Ollama model settings:

yaml

models:
  - name: "local-coding-advanced"
    provider: "ollama"
    model: "codellama:13b"
    baseURL: "http://localhost:11434"
    role: "chat"

    # Model parameters
    temperature: 0.2 # Lower for more deterministic output
    maxTokens: 2048
    topP: 0.9
    frequencyPenalty: 0.1
    presencePenalty: 0.1

    # Ollama-specific options
    options:
      num_ctx: 4096 # Context window
      num_predict: 512 # Max tokens to predict
      repeat_last_n: 64 # How far back to look for repetition
      repeat_penalty: 1.1 # Penalty for repetition
      top_k: 40 # Limit to top K choices
      tfs_z: 1.0 # Tail free sampling
      mirostat: 0 # Mirostat sampling (0=disabled, 1=MLE, 2=Mirosample)
      mirostat_tau: 5.0 # Mirostat tau
      mirostat_eta: 0.1 # Mirostat eta

Multiple Model Setup

Configure multiple Ollama models for different tasks:

yaml

models:
  # Fast autocomplete model
  - name: "ollama-autocomplete"
    provider: "ollama"
    model: "mistral:7b"
    baseURL: "http://localhost:11434"
    role: "autocomplete"
    temperature: 0.1

  # Powerful coding model
  - name: "ollama-coding"
    provider: "ollama"
    model: "codellama:13b"
    baseURL: "http://localhost:11434"
    role: "chat"
    temperature: 0.3

  # General purpose model
  - name: "ollama-general"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"
    role: "chat"
    temperature: 0.7

Model Management

Listing Models

View downloaded models:

bash

# List all models
ollama list

# Show model details
ollama show codellama:7b

# Show model information in JSON
ollama list --json

Managing Models

Download, remove, and manage models:

bash

# Pull a model
ollama pull llama3:8b

# Remove a model
ollama rm codellama:7b

# Copy a model
ollama cp llama3:8b my-custom-llama:latest

# Create a model from Modelfile
ollama create my-model -f ./Modelfile

Model Information

Get detailed model information:

bash

# Show model parameters
ollama show llama3:8b --modelfile

# Show license information
ollama show llama3:8b --license

Performance Optimization

Hardware Acceleration

Enable GPU acceleration for better performance:

bash

# Check GPU support
ollama list

# Ollama automatically uses available GPUs
# For NVIDIA GPUs, ensure CUDA is installed
# For AMD GPUs, ensure ROCm is installed
# For Apple Silicon, Metal is automatically used

Resource Management

Control resource usage:

yaml

models:
  - name: "resource-managed-model"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"

    options:
      num_thread: 8 # Limit CPU threads
      num_gpu: 1 # Limit GPU layers
      num_keep: 4 # Number of tokens to keep between chunks

Context Management

Optimize context window usage:

yaml

models:
  - name: "context-optimized"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"

    options:
      num_ctx: 4096 # Context window size
      num_batch: 512 # Batch size for prompt processing

Custom Models

Creating Modelfiles

Create custom model configurations:

dockerfile

# Modelfile
FROM llama3:8b

# Set system message
SYSTEM """
You are a helpful AI coding assistant integrated with ByteBuddy.
Focus on providing accurate, secure, and efficient code solutions.
"""

# Add custom parameters
PARAMETER temperature 0.3
PARAMETER repeat_penalty 1.1

# Add template for better formatting
TEMPLATE """
<|begin_of_sentence|>{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|><|end_of_sentence|>
"""

Build the custom model:

bash

ollama create my-custom-model -f ./Modelfile

Model Customization

Customize models for specific tasks:

dockerfile

# CodingAssistant Modelfile
FROM codellama:7b

SYSTEM """
You are an expert coding assistant. Provide:
1. Working code examples
2. Explanations of code functionality
3. Best practices and security considerations
4. Error handling and edge cases
"""

PARAMETER temperature 0.2
PARAMETER repeat_penalty 1.2
PARAMETER top_k 50
PARAMETER top_p 0.9

TEMPLATE """
[INST] <<SYS>>
{{ .System }}
<</SYS>>

{{ .Prompt }} [/INST] {{ .Response }}
"""

Integration with Development Workflow

Development Environment Setup

Configure your development environment:

yaml

# .bytebuddy/config.yaml
models:
  - name: "dev-local"
    provider: "ollama"
    model: "codellama:7b"
    baseURL: "http://localhost:11434"
    role: "chat"

  - name: "dev-fast"
    provider: "ollama"
    model: "mistral:7b"
    baseURL: "http://localhost:11434"
    role: "autocomplete"

preferences:
  # Optimize for local models
  maxContextTokens: 4096
  requestTimeout: 60 # Local models may take longer

Project-Specific Configuration

Tailor models to specific projects:

yaml

# Project-specific .bytebuddy/config.yaml
models:
  - name: "python-project"
    provider: "ollama"
    model: "codellama:7b-python"
    baseURL: "http://localhost:11434"
    role: "chat"

  - name: "javascript-project"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://localhost:11434"
    role: "chat"

Troubleshooting

Common Issues

Connection Problems

bash

# Check if Ollama is running
ps aux | grep ollama

# Check Ollama service status
sudo systemctl status ollama  # Linux
brew services list | grep ollama  # macOS

# Test connection
curl http://localhost:11434/api/tags

Model Loading Issues

bash

# Check available models
ollama list

# Re-pull problematic model
ollama pull codellama:7b --insecure

# Check model integrity
ollama show codellama:7b --modelfile

Performance Issues

bash

# Check system resources
htop  # or Activity Monitor on macOS

# Limit CPU usage
OLLAMA_NUM_THREAD=4 ollama serve

# Check GPU usage
nvidia-smi  # for NVIDIA GPUs

Debugging Commands

bash

# Enable verbose logging
OLLAMA_DEBUG=1 ollama serve

# Check Ollama logs
journalctl -u ollama -f  # Linux
tail -f /usr/local/var/log/ollama.log  # macOS

# Test model directly
ollama run llama3:8b "Hello, how are you?"

# Get model information
ollama show llama3:8b --all

Best Practices

Model Selection

Match Model to Task: Use specialized models for specific tasks
Consider Hardware: Choose models that fit your hardware
Balance Speed vs Quality: Smaller models for fast tasks, larger for complex ones
Test Multiple Models: Find what works best for your use case

Resource Management

Monitor Usage: Keep track of CPU/GPU/memory usage
Limit Concurrent Models: Don't run too many models simultaneously
Clean Up Unused Models: Remove models you're not using
Use Appropriate Context: Don't use larger context windows than needed

Security

Keep Models Updated: Regularly update to latest versions
Verify Sources: Only use trusted model sources
Monitor Network: Ollama should only listen on localhost by default
Secure Configuration: Protect your configuration files

Advanced Features

Model Serving

Serve models on custom ports:

bash

# Serve on custom port
OLLAMA_HOST=http://0.0.0.0:11435 ollama serve

# Configure ByteBuddy to use custom port

yaml

models:
  - name: "remote-ollama"
    provider: "ollama"
    model: "llama3:8b"
    baseURL: "http://192.168.1.100:11434" # Remote Ollama server

Model Chaining

Use multiple models in sequence:

yaml

rules:
  - name: "code-review-process"
    prompt: |
      First, analyze this code for security issues using a security-focused model.
      Then, review for performance optimizations using a performance-focused model.
      Finally, check for best practices using a general coding model.

Next Steps

After setting up Ollama with ByteBuddy, explore these related guides:

How to Self-Host a Model - Learn about other local model options
Running ByteBuddy Without Internet - Work completely offline
Plan Mode Guide - Use advanced planning features with local models

Ollama Guide ​

What is Ollama? ​

Installing Ollama ​

macOS ​

Windows ​

Linux ​

Starting Ollama ​

Running the Service ​

System Startup ​

Popular Models ​

Code-Focused Models ​

General Purpose Models ​

Configuring ByteBuddy with Ollama ​

Basic Configuration ​

Advanced Configuration ​

Multiple Model Setup ​

Model Management ​

Listing Models ​

Managing Models ​

Model Information ​

Performance Optimization ​

Hardware Acceleration ​

Resource Management ​

Context Management ​

Custom Models ​

Creating Modelfiles ​

Model Customization ​

Integration with Development Workflow ​

Development Environment Setup ​

Project-Specific Configuration ​

Troubleshooting ​

Common Issues ​

Connection Problems ​

Model Loading Issues ​

Performance Issues ​

Debugging Commands ​

Best Practices ​

Model Selection ​

Resource Management ​

Security ​

Advanced Features ​

Model Serving ​

Model Chaining ​

Next Steps ​

Ollama Guide

What is Ollama?

Installing Ollama

macOS

Windows

Linux

Starting Ollama

Running the Service

System Startup

Popular Models

Code-Focused Models

General Purpose Models

Configuring ByteBuddy with Ollama

Basic Configuration

Advanced Configuration

Multiple Model Setup

Model Management

Listing Models

Managing Models

Model Information

Performance Optimization

Hardware Acceleration

Resource Management

Context Management

Custom Models

Creating Modelfiles

Model Customization

Integration with Development Workflow

Development Environment Setup

Project-Specific Configuration

Troubleshooting

Common Issues

Connection Problems

Model Loading Issues

Performance Issues

Debugging Commands

Best Practices

Model Selection

Resource Management

Security

Advanced Features

Model Serving

Model Chaining

Next Steps