LoRA Adapters
Download and self-host adapter weights from open-source fine-tunes.
What is a LoRA adapter?
When you fine-tune Qwen 3 8B, Commissioned uses QLoRA (Quantized Low-Rank Adaptation) to produce a small set of adapter weights. Instead of modifying the entire 8B-parameter model, QLoRA trains a lightweight delta that sits on top of the base model.
- Adapter size: ~50–200 MB (vs 16+ GB for the full model)
- Training time: ~5 minutes (vs 30–45 for cloud models)
- Portable: download the adapter and run it anywhere
Downloading
- Go to your dashboard
- Find the Qwen model card (status must be Succeeded)
- Click Download adapter
- Save the
.zipfile
The archive contains the LoRA weight files in a standard format compatible with most inference tools.
Self-hosting
Load the adapter on top of the base Qwen 3 8B model using any of these tools:
vLLM supports LoRA adapters natively with dynamic loading.
# Start vLLM with the base model
vllm serve Qwen/Qwen3-8B \
--enable-lora \
--lora-modules my-adapter=/path/to/adapter
# Call with the adapter name
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "my-adapter", "messages": [{"role": "user", "content": "Hello"}]}'vLLM can serve multiple LoRA adapters on a single base model, switching between them per request.
Ollama is the simplest option for local inference.
Create a Modelfile:
FROM Qwen/Qwen3-8B
ADAPTER /path/to/adapterThen:
ollama create my-model -f Modelfile
ollama run my-modelllama.cpp is a lightweight option for CPU or GPU inference.
# Convert adapter if needed
python convert_lora_to_gguf.py /path/to/adapter
# Run with the base model + adapter
llama-server \
-m qwen3-8b.gguf \
--lora adapter.gguf \
--port 8080Text Generation Inference supports LoRA adapters for production serving.
docker run --gpus all \
-v /path/to/adapter:/data/adapter \
ghcr.io/huggingface/text-generation-inference \
--model-id Qwen/Qwen3-8B \
--lora-adapters my-adapter=/data/adapterWhen to self-host vs use Commissioned's API
| Commissioned API | Self-hosted | |
|---|---|---|
| Setup | Zero — just call the endpoint | Need GPU hardware + deployment |
| Rate limits | Plan-based limits | No limits (hardware-bound) |
| Cost at scale | Per-plan pricing | Amortized GPU cost |
| Data residency | Commissioned's infrastructure | Your infrastructure |
| Maintenance | Managed by us | Managed by you |
| Latency | Network round-trip | Can be local |
For most users, the hosted API is simpler. Self-hosting makes sense for high-volume production use, strict data residency requirements, or offline/air-gapped environments.
Availability
LoRA adapter downloads are available for Qwen 3 8B fine-tunes on all plans (including free). Cloud-provider models (OpenAI, Gemini) don't produce downloadable adapters — they're hosted exclusively by the provider.