Skip to content

LM Studio

LM Studio allows you to run large language models locally, providing complete privacy control and offline usage capability.

Supported Models

Lightweight Models

  • Qwen2.5-1.5B-Instruct - Fast response, low resource consumption
  • Phi-3-mini-4k-instruct - Microsoft lightweight model
  • Gemma-2B-it - Google lightweight model

Medium Models

  • Qwen2.5-7B-Instruct - Balance performance and resources
  • Llama-3.1-8B-Instruct - Meta popular model
  • Mistral-7B-Instruct-v0.2 - Excellent open-source model

High-Performance Models

  • Qwen2.5-14B-Instruct - High-quality output
  • Llama-3.1-70B-Instruct - Top-tier performance
  • Mixtral-8x7B-Instruct-v0.1 - MoE architecture

Configuration

Basic Configuration

Configure in config.yaml or ~/.bytebuddy/config.yaml:

yaml
models:
  - name: "lmstudio-local"
    provider: "lmstudio"
    model: "local-model"
    apiBase: "http://localhost:1234/v1"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Specific Model Configuration

yaml
models:
  - name: "qwen-local"
    provider: "lmstudio"
    model: "Qwen2.5-7B-Instruct"
    apiBase: "http://localhost:1234/v1"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.5
      maxTokens: 8192
      topP: 0.9

Multi-Model Configuration

yaml
models:
  - name: "lmstudio-fast"
    provider: "lmstudio"
    model: "Phi-3-mini-4k-instruct"
    apiBase: "http://localhost:1234/v1"
    roles: ["autocomplete"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 2048

  - name: "lmstudio-quality"
    provider: "lmstudio"
    model: "Qwen2.5-14B-Instruct"
    apiBase: "http://localhost:1234/v1"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.5
      maxTokens: 4096

Configuration Fields

Required Fields

  • name: Unique identifier for the model configuration
  • provider: Set to "lmstudio"
  • apiBase: LM Studio server address

Optional Fields

  • model: Model name (defaults to model loaded in LM Studio)
  • roles: Model roles [chat, edit, apply, autocomplete]
  • defaultCompletionOptions:
    • temperature: Control randomness (0-2)
    • maxTokens: Maximum tokens
    • topP: Nucleus sampling parameter
    • topK: Sampling candidates count
  • requestOptions:
    • timeout: Request timeout (milliseconds)

Installation and Setup

1. Install LM Studio

bash
# macOS
# Download from https://lmstudio.ai

# Windows
# Download installer from https://lmstudio.ai

# Linux
# Download AppImage from https://lmstudio.ai

2. Start LM Studio

  1. Launch LM Studio application
  2. Enable server mode in settings
  3. Download and load required models
  4. Start the local server

3. Configure Server

In LM Studio:

  1. Click "Server" tab
  2. Set port (default 1234)
  3. Select model to load
  4. Click "Start Server"

4. Verify Connection

bash
# Test if server is running
curl http://localhost:1234/v1/models

Use Case Configurations

Local Development

yaml
models:
  - name: "local-dev"
    provider: "lmstudio"
    model: "codellama-7b"
    apiBase: "http://localhost:1234/v1"
    roles: ["chat", "edit"]
    defaultCompletionOptions:
      temperature: 0.3
      maxTokens: 2000

Privacy Protection

yaml
models:
  - name: "private-chat"
    provider: "lmstudio"
    model: "llama-3-8b"
    apiBase: "http://localhost:1234/v1"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 1000
    requestOptions:
      timeout: 60000

Remote Access

yaml
models:
  - name: "remote-lmstudio"
    provider: "lmstudio"
    apiBase: "http://192.168.1.100:1234/v1"
    roles: ["chat"]
    defaultCompletionOptions:
      temperature: 0.7
      maxTokens: 4096

Performance Optimization

GPU Acceleration

Ensure LM Studio uses GPU:

  1. Enable GPU in LM Studio settings
  2. Select appropriate quantization level
  3. Adjust context length

Memory Management

yaml
models:
  - name: "optimized"
    provider: "lmstudio"
    apiBase: "http://localhost:1234/v1"
    defaultCompletionOptions:
      maxTokens: 2048 # Reduce memory usage

Troubleshooting

Q: LM Studio connection failed?

A: Check the following:

  • LM Studio service is running
  • Port 1234 is available
  • Firewall settings are correct

Q: Model loading is slow?

A:

  • Choose smaller models
  • Use quantized versions
  • Ensure sufficient RAM/VRAM

Q: Response time too long?

A:

  • Use smaller models
  • Enable GPU acceleration
  • Reduce maxTokens setting

Best Practices

1. Model Selection

  • Fast response: Use 1.5B-7B models
  • High quality: Use 14B+ models
  • Privacy sensitive: Always use local models

2. Hardware Requirements

  • Minimum: 16GB RAM, 8GB models
  • Recommended: 32GB RAM, GPU acceleration
  • Optimal: 64GB RAM, high-end GPU

3. Security Considerations

  • Local deployment ensures data privacy
  • Regularly update LM Studio
  • Limit network access (localhost only)

4. Performance Tuning

  • Preload frequently used models
  • Use appropriate temperature parameters
  • Select appropriate quantization level