Ollama Guide
Ollama is a powerful tool for running large language models locally. This guide explains how to integrate Ollama with ByteBuddy to leverage local AI models for development assistance.
What is Ollama?
Ollama is a tool that makes it easy to run large language models locally on your machine. Benefits of using Ollama with ByteBuddy include:
- Privacy: Keep your code and data local
- No Internet Required: Work offline
- Cost Effective: No API costs
- Customizable: Run any compatible model
- Fast: Low latency for local models
Installing Ollama
macOS
Install using Homebrew:
brew install ollamaOr download from ollama.ai:
curl -fsSL https://ollama.ai/install.sh | shWindows
Download the Windows installer from ollama.ai and run the installer.
Linux
Install using curl:
curl -fsSL https://ollama.ai/install.sh | shOr use package managers:
# Ubuntu/Debian
sudo apt install ollama
# Fedora
sudo dnf install ollama
# CentOS/RHEL
sudo yum install ollamaStarting Ollama
Running the Service
Start Ollama service:
# Start Ollama daemon
ollama serve
# Or run in background
ollama serve &
# Check if running
ollama listSystem Startup
Enable Ollama to start automatically:
# macOS (using Homebrew)
brew services start ollama
# Linux (systemd)
sudo systemctl enable ollama
sudo systemctl start ollamaPopular Models
Code-Focused Models
Download models optimized for coding:
# CodeLlama (Meta's coding model)
ollama pull codellama:7b
ollama pull codellama:13b
ollama pull codellama:34b
# CodeLlama with Python instruction tuning
ollama pull codellama:7b-python
# Llama 3 (general purpose with good coding abilities)
ollama pull llama3:8b
ollama pull llama3:70b
# Mistral (efficient and capable)
ollama pull mistral:7b
ollama pull mixtral:8x7bGeneral Purpose Models
Other useful models:
# Phi-3 (Microsoft's compact model)
ollama pull phi3:3.8b
# Gemma (Google's models)
ollama pull gemma:2b
ollama pull gemma:7b
# Neural Chat (Intel's model)
ollama pull neural-chat:7bConfiguring ByteBuddy with Ollama
Basic Configuration
Configure ByteBuddy to use Ollama models:
# .bytebuddy/config.yaml
models:
- name: "local-coding"
provider: "ollama"
model: "codellama:7b"
baseURL: "http://localhost:11434"
role: "chat"
- name: "local-fast"
provider: "ollama"
model: "mistral:7b"
baseURL: "http://localhost:11434"
role: "autocomplete"Advanced Configuration
Fine-tune Ollama model settings:
models:
- name: "local-coding-advanced"
provider: "ollama"
model: "codellama:13b"
baseURL: "http://localhost:11434"
role: "chat"
# Model parameters
temperature: 0.2 # Lower for more deterministic output
maxTokens: 2048
topP: 0.9
frequencyPenalty: 0.1
presencePenalty: 0.1
# Ollama-specific options
options:
num_ctx: 4096 # Context window
num_predict: 512 # Max tokens to predict
repeat_last_n: 64 # How far back to look for repetition
repeat_penalty: 1.1 # Penalty for repetition
top_k: 40 # Limit to top K choices
tfs_z: 1.0 # Tail free sampling
mirostat: 0 # Mirostat sampling (0=disabled, 1=MLE, 2=Mirosample)
mirostat_tau: 5.0 # Mirostat tau
mirostat_eta: 0.1 # Mirostat etaMultiple Model Setup
Configure multiple Ollama models for different tasks:
models:
# Fast autocomplete model
- name: "ollama-autocomplete"
provider: "ollama"
model: "mistral:7b"
baseURL: "http://localhost:11434"
role: "autocomplete"
temperature: 0.1
# Powerful coding model
- name: "ollama-coding"
provider: "ollama"
model: "codellama:13b"
baseURL: "http://localhost:11434"
role: "chat"
temperature: 0.3
# General purpose model
- name: "ollama-general"
provider: "ollama"
model: "llama3:8b"
baseURL: "http://localhost:11434"
role: "chat"
temperature: 0.7Model Management
Listing Models
View downloaded models:
# List all models
ollama list
# Show model details
ollama show codellama:7b
# Show model information in JSON
ollama list --jsonManaging Models
Download, remove, and manage models:
# Pull a model
ollama pull llama3:8b
# Remove a model
ollama rm codellama:7b
# Copy a model
ollama cp llama3:8b my-custom-llama:latest
# Create a model from Modelfile
ollama create my-model -f ./ModelfileModel Information
Get detailed model information:
# Show model parameters
ollama show llama3:8b --modelfile
# Show license information
ollama show llama3:8b --licensePerformance Optimization
Hardware Acceleration
Enable GPU acceleration for better performance:
# Check GPU support
ollama list
# Ollama automatically uses available GPUs
# For NVIDIA GPUs, ensure CUDA is installed
# For AMD GPUs, ensure ROCm is installed
# For Apple Silicon, Metal is automatically usedResource Management
Control resource usage:
models:
- name: "resource-managed-model"
provider: "ollama"
model: "llama3:8b"
baseURL: "http://localhost:11434"
options:
num_thread: 8 # Limit CPU threads
num_gpu: 1 # Limit GPU layers
num_keep: 4 # Number of tokens to keep between chunksContext Management
Optimize context window usage:
models:
- name: "context-optimized"
provider: "ollama"
model: "llama3:8b"
baseURL: "http://localhost:11434"
options:
num_ctx: 4096 # Context window size
num_batch: 512 # Batch size for prompt processingCustom Models
Creating Modelfiles
Create custom model configurations:
# Modelfile
FROM llama3:8b
# Set system message
SYSTEM """
You are a helpful AI coding assistant integrated with ByteBuddy.
Focus on providing accurate, secure, and efficient code solutions.
"""
# Add custom parameters
PARAMETER temperature 0.3
PARAMETER repeat_penalty 1.1
# Add template for better formatting
TEMPLATE """
<|begin_of_sentence|>{{ if .System }}<|start_header_id|>system<|end_header_id|>
{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
{{ .Response }}<|eot_id|><|end_of_sentence|>
"""Build the custom model:
ollama create my-custom-model -f ./ModelfileModel Customization
Customize models for specific tasks:
# CodingAssistant Modelfile
FROM codellama:7b
SYSTEM """
You are an expert coding assistant. Provide:
1. Working code examples
2. Explanations of code functionality
3. Best practices and security considerations
4. Error handling and edge cases
"""
PARAMETER temperature 0.2
PARAMETER repeat_penalty 1.2
PARAMETER top_k 50
PARAMETER top_p 0.9
TEMPLATE """
[INST] <<SYS>>
{{ .System }}
<</SYS>>
{{ .Prompt }} [/INST] {{ .Response }}
"""Integration with Development Workflow
Development Environment Setup
Configure your development environment:
# .bytebuddy/config.yaml
models:
- name: "dev-local"
provider: "ollama"
model: "codellama:7b"
baseURL: "http://localhost:11434"
role: "chat"
- name: "dev-fast"
provider: "ollama"
model: "mistral:7b"
baseURL: "http://localhost:11434"
role: "autocomplete"
preferences:
# Optimize for local models
maxContextTokens: 4096
requestTimeout: 60 # Local models may take longerProject-Specific Configuration
Tailor models to specific projects:
# Project-specific .bytebuddy/config.yaml
models:
- name: "python-project"
provider: "ollama"
model: "codellama:7b-python"
baseURL: "http://localhost:11434"
role: "chat"
- name: "javascript-project"
provider: "ollama"
model: "llama3:8b"
baseURL: "http://localhost:11434"
role: "chat"Troubleshooting
Common Issues
Connection Problems
# Check if Ollama is running
ps aux | grep ollama
# Check Ollama service status
sudo systemctl status ollama # Linux
brew services list | grep ollama # macOS
# Test connection
curl http://localhost:11434/api/tagsModel Loading Issues
# Check available models
ollama list
# Re-pull problematic model
ollama pull codellama:7b --insecure
# Check model integrity
ollama show codellama:7b --modelfilePerformance Issues
# Check system resources
htop # or Activity Monitor on macOS
# Limit CPU usage
OLLAMA_NUM_THREAD=4 ollama serve
# Check GPU usage
nvidia-smi # for NVIDIA GPUsDebugging Commands
# Enable verbose logging
OLLAMA_DEBUG=1 ollama serve
# Check Ollama logs
journalctl -u ollama -f # Linux
tail -f /usr/local/var/log/ollama.log # macOS
# Test model directly
ollama run llama3:8b "Hello, how are you?"
# Get model information
ollama show llama3:8b --allBest Practices
Model Selection
- Match Model to Task: Use specialized models for specific tasks
- Consider Hardware: Choose models that fit your hardware
- Balance Speed vs Quality: Smaller models for fast tasks, larger for complex ones
- Test Multiple Models: Find what works best for your use case
Resource Management
- Monitor Usage: Keep track of CPU/GPU/memory usage
- Limit Concurrent Models: Don't run too many models simultaneously
- Clean Up Unused Models: Remove models you're not using
- Use Appropriate Context: Don't use larger context windows than needed
Security
- Keep Models Updated: Regularly update to latest versions
- Verify Sources: Only use trusted model sources
- Monitor Network: Ollama should only listen on localhost by default
- Secure Configuration: Protect your configuration files
Advanced Features
Model Serving
Serve models on custom ports:
# Serve on custom port
OLLAMA_HOST=http://0.0.0.0:11435 ollama serve
# Configure ByteBuddy to use custom portmodels:
- name: "remote-ollama"
provider: "ollama"
model: "llama3:8b"
baseURL: "http://192.168.1.100:11434" # Remote Ollama serverModel Chaining
Use multiple models in sequence:
rules:
- name: "code-review-process"
prompt: |
First, analyze this code for security issues using a security-focused model.
Then, review for performance optimizations using a performance-focused model.
Finally, check for best practices using a general coding model.Next Steps
After setting up Ollama with ByteBuddy, explore these related guides:
- How to Self-Host a Model - Learn about other local model options
- Running ByteBuddy Without Internet - Work completely offline
- Plan Mode Guide - Use advanced planning features with local models