Commissioned
Models

LoRA Adapters

Download and self-host adapter weights from open-source fine-tunes.

What is a LoRA adapter?

When you fine-tune Qwen 3 8B, Commissioned uses QLoRA (Quantized Low-Rank Adaptation) to produce a small set of adapter weights. Instead of modifying the entire 8B-parameter model, QLoRA trains a lightweight delta that sits on top of the base model.

  • Adapter size: ~50–200 MB (vs 16+ GB for the full model)
  • Training time: ~5 minutes (vs 30–45 for cloud models)
  • Portable: download the adapter and run it anywhere

Downloading

  1. Go to your dashboard
  2. Find the Qwen model card (status must be Succeeded)
  3. Click Download adapter
  4. Save the .zip file

The archive contains the LoRA weight files in a standard format compatible with most inference tools.

Self-hosting

Load the adapter on top of the base Qwen 3 8B model using any of these tools:

vLLM supports LoRA adapters natively with dynamic loading.

# Start vLLM with the base model
vllm serve Qwen/Qwen3-8B \
  --enable-lora \
  --lora-modules my-adapter=/path/to/adapter

# Call with the adapter name
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "my-adapter", "messages": [{"role": "user", "content": "Hello"}]}'

vLLM can serve multiple LoRA adapters on a single base model, switching between them per request.

Ollama is the simplest option for local inference.

Create a Modelfile:

FROM Qwen/Qwen3-8B
ADAPTER /path/to/adapter

Then:

ollama create my-model -f Modelfile
ollama run my-model

llama.cpp is a lightweight option for CPU or GPU inference.

# Convert adapter if needed
python convert_lora_to_gguf.py /path/to/adapter

# Run with the base model + adapter
llama-server \
  -m qwen3-8b.gguf \
  --lora adapter.gguf \
  --port 8080

Text Generation Inference supports LoRA adapters for production serving.

docker run --gpus all \
  -v /path/to/adapter:/data/adapter \
  ghcr.io/huggingface/text-generation-inference \
  --model-id Qwen/Qwen3-8B \
  --lora-adapters my-adapter=/data/adapter

When to self-host vs use Commissioned's API

Commissioned APISelf-hosted
SetupZero — just call the endpointNeed GPU hardware + deployment
Rate limitsPlan-based limitsNo limits (hardware-bound)
Cost at scalePer-plan pricingAmortized GPU cost
Data residencyCommissioned's infrastructureYour infrastructure
MaintenanceManaged by usManaged by you
LatencyNetwork round-tripCan be local

For most users, the hosted API is simpler. Self-hosting makes sense for high-volume production use, strict data residency requirements, or offline/air-gapped environments.

Availability

LoRA adapter downloads are available for Qwen 3 8B fine-tunes on all plans (including free). Cloud-provider models (OpenAI, Gemini) don't produce downloadable adapters — they're hosted exclusively by the provider.

On this page