Models

LoRA Adapters

Download and self-host adapter weights from open-source fine-tunes.

What is a LoRA adapter?

When you fine-tune Qwen 3 8B, Commissioned uses QLoRA (Quantized Low-Rank Adaptation) to produce a small set of adapter weights. Instead of modifying the entire 8B-parameter model, QLoRA trains a lightweight delta that sits on top of the base model.

Adapter size: ~50–200 MB (vs 16+ GB for the full model)
Training time: ~5 minutes (vs 30–45 for cloud models)
Portable: download the adapter and run it anywhere

Downloading

Go to your dashboard
Find the Qwen model card (status must be Succeeded)
Click Download adapter
Save the .zip file

The archive contains the LoRA weight files in a standard format compatible with most inference tools.

Self-hosting

Load the adapter on top of the base Qwen 3 8B model using any of these tools:

vLLM supports LoRA adapters natively with dynamic loading.

# Start vLLM with the base model
vllm serve Qwen/Qwen3-8B \
  --enable-lora \
  --lora-modules my-adapter=/path/to/adapter

# Call with the adapter name
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "my-adapter", "messages": [{"role": "user", "content": "Hello"}]}'

vLLM can serve multiple LoRA adapters on a single base model, switching between them per request.

Ollama is the simplest option for local inference.

Create a Modelfile:

FROM Qwen/Qwen3-8B
ADAPTER /path/to/adapter

Then:

ollama create my-model -f Modelfile
ollama run my-model

llama.cpp is a lightweight option for CPU or GPU inference.

# Convert adapter if needed
python convert_lora_to_gguf.py /path/to/adapter

# Run with the base model + adapter
llama-server \
  -m qwen3-8b.gguf \
  --lora adapter.gguf \
  --port 8080

Text Generation Inference supports LoRA adapters for production serving.

docker run --gpus all \
  -v /path/to/adapter:/data/adapter \
  ghcr.io/huggingface/text-generation-inference \
  --model-id Qwen/Qwen3-8B \
  --lora-adapters my-adapter=/data/adapter

When to self-host vs use Commissioned's API

	Commissioned API	Self-hosted
Setup	Zero — just call the endpoint	Need GPU hardware + deployment
Rate limits	Plan-based limits	No limits (hardware-bound)
Cost at scale	Per-plan pricing	Amortized GPU cost
Data residency	Commissioned's infrastructure	Your infrastructure
Maintenance	Managed by us	Managed by you
Latency	Network round-trip	Can be local

For most users, the hosted API is simpler. Self-hosting makes sense for high-volume production use, strict data residency requirements, or offline/air-gapped environments.

Availability

LoRA adapter downloads are available for Qwen 3 8B fine-tunes on all plans (including free). Cloud-provider models (OpenAI, Gemini) don't produce downloadable adapters — they're hosted exclusively by the provider.

Managing Models

View, rename, organize, and understand your fine-tuned models.

Chat Interface

Chat with your fine-tuned models directly in the browser.