LM Studio
LM Studio allows you to run large language models locally, providing complete privacy control and offline usage capability.
Supported Models
Lightweight Models
- Qwen2.5-1.5B-Instruct - Fast response, low resource consumption
- Phi-3-mini-4k-instruct - Microsoft lightweight model
- Gemma-2B-it - Google lightweight model
Medium Models
- Qwen2.5-7B-Instruct - Balance performance and resources
- Llama-3.1-8B-Instruct - Meta popular model
- Mistral-7B-Instruct-v0.2 - Excellent open-source model
High-Performance Models
- Qwen2.5-14B-Instruct - High-quality output
- Llama-3.1-70B-Instruct - Top-tier performance
- Mixtral-8x7B-Instruct-v0.1 - MoE architecture
Configuration
Basic Configuration
Configure in config.yaml or ~/.bytebuddy/config.yaml:
yaml
models:
- name: "lmstudio-local"
provider: "lmstudio"
model: "local-model"
apiBase: "http://localhost:1234/v1"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096Specific Model Configuration
yaml
models:
- name: "qwen-local"
provider: "lmstudio"
model: "Qwen2.5-7B-Instruct"
apiBase: "http://localhost:1234/v1"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.5
maxTokens: 8192
topP: 0.9Multi-Model Configuration
yaml
models:
- name: "lmstudio-fast"
provider: "lmstudio"
model: "Phi-3-mini-4k-instruct"
apiBase: "http://localhost:1234/v1"
roles: ["autocomplete"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 2048
- name: "lmstudio-quality"
provider: "lmstudio"
model: "Qwen2.5-14B-Instruct"
apiBase: "http://localhost:1234/v1"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.5
maxTokens: 4096Configuration Fields
Required Fields
- name: Unique identifier for the model configuration
- provider: Set to
"lmstudio" - apiBase: LM Studio server address
Optional Fields
- model: Model name (defaults to model loaded in LM Studio)
- roles: Model roles [
chat,edit,apply,autocomplete] - defaultCompletionOptions:
temperature: Control randomness (0-2)maxTokens: Maximum tokenstopP: Nucleus sampling parametertopK: Sampling candidates count
- requestOptions:
timeout: Request timeout (milliseconds)
Installation and Setup
1. Install LM Studio
bash
# macOS
# Download from https://lmstudio.ai
# Windows
# Download installer from https://lmstudio.ai
# Linux
# Download AppImage from https://lmstudio.ai2. Start LM Studio
- Launch LM Studio application
- Enable server mode in settings
- Download and load required models
- Start the local server
3. Configure Server
In LM Studio:
- Click "Server" tab
- Set port (default 1234)
- Select model to load
- Click "Start Server"
4. Verify Connection
bash
# Test if server is running
curl http://localhost:1234/v1/modelsUse Case Configurations
Local Development
yaml
models:
- name: "local-dev"
provider: "lmstudio"
model: "codellama-7b"
apiBase: "http://localhost:1234/v1"
roles: ["chat", "edit"]
defaultCompletionOptions:
temperature: 0.3
maxTokens: 2000Privacy Protection
yaml
models:
- name: "private-chat"
provider: "lmstudio"
model: "llama-3-8b"
apiBase: "http://localhost:1234/v1"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 1000
requestOptions:
timeout: 60000Remote Access
yaml
models:
- name: "remote-lmstudio"
provider: "lmstudio"
apiBase: "http://192.168.1.100:1234/v1"
roles: ["chat"]
defaultCompletionOptions:
temperature: 0.7
maxTokens: 4096Performance Optimization
GPU Acceleration
Ensure LM Studio uses GPU:
- Enable GPU in LM Studio settings
- Select appropriate quantization level
- Adjust context length
Memory Management
yaml
models:
- name: "optimized"
provider: "lmstudio"
apiBase: "http://localhost:1234/v1"
defaultCompletionOptions:
maxTokens: 2048 # Reduce memory usageTroubleshooting
Q: LM Studio connection failed?
A: Check the following:
- LM Studio service is running
- Port 1234 is available
- Firewall settings are correct
Q: Model loading is slow?
A:
- Choose smaller models
- Use quantized versions
- Ensure sufficient RAM/VRAM
Q: Response time too long?
A:
- Use smaller models
- Enable GPU acceleration
- Reduce maxTokens setting
Best Practices
1. Model Selection
- Fast response: Use 1.5B-7B models
- High quality: Use 14B+ models
- Privacy sensitive: Always use local models
2. Hardware Requirements
- Minimum: 16GB RAM, 8GB models
- Recommended: 32GB RAM, GPU acceleration
- Optimal: 64GB RAM, high-end GPU
3. Security Considerations
- Local deployment ensures data privacy
- Regularly update LM Studio
- Limit network access (localhost only)
4. Performance Tuning
- Preload frequently used models
- Use appropriate temperature parameters
- Select appropriate quantization level