Commissioned
Models

How Fine-Tuning Works

The end-to-end pipeline from data upload to deployed model.

The pipeline

Data ingestion

Your files are parsed based on their format:

  • PDF → text extraction (handles multi-column layouts, tables, headers/footers)
  • JSONL → each line parsed as a JSON object
  • JSON → recursive text extraction from nested structures
  • TXT / Markdown → read as-is with encoding normalization

Cleaning and formatting

Raw extracted text goes through an automated pipeline:

  • Deduplication — exact and near-duplicate content is removed
  • Noise removal — boilerplate, encoding artifacts, and irrelevant metadata are stripped
  • Structuring — content is organized into training examples based on your use-case description
  • Provider formatting — examples are converted into the specific format required by OpenAI, Gemini, or Qwen

Your use-case description is critical here — it tells the pipeline whether to treat your data as conversation pairs, reference material, stylistic examples, or something else.

Validation

Before submitting to the provider, the formatted data is validated:

  • Minimum example count is met
  • Token limits per example aren't exceeded
  • Required fields are present
  • Format matches the provider's specification

If validation fails, the job moves to Failed status with an error.

Training

The validated data is submitted to the target provider:

ProviderWhat happens
OpenAIData sent to OpenAI's fine-tuning API. Training runs on OpenAI's infrastructure.
Google GeminiData sent to Gemini's tuning API. Training runs on Google Cloud.
GPU (Qwen)QLoRA training runs on Commissioned's own GPU cluster.

Training times:

  • Cloud models: 30–45 minutes (occasionally longer during high demand)
  • GPU / Qwen: ~5 minutes

Deployment

Once training completes:

  • Cloud models are immediately available through the provider's inference infrastructure. Commissioned routes your chat and API requests to the fine-tuned model.
  • GPU models produce a LoRA adapter that's stored on Commissioned's infrastructure. The model is served for chat and API, and the adapter is available for download.

Job lifecycle

Every fine-tuning job goes through these statuses:

Queued → Validating files → In progress → Succeeded
                                        ↘ Failed
                                        ↘ Cancelled
StatusDurationWhat's happening
QueuedSecondsJob is in the queue, waiting to start
Validating files1–5 minutesData is being parsed, cleaned, and formatted
In progress5–45 minutesModel is actively training
SucceededTerminalModel is ready to use
FailedTerminalSomething went wrong (data issue or provider error)
CancelledTerminalYou cancelled the job

Polling and notifications

You don't need to watch the dashboard. Commissioned:

  • Polls the provider for status updates automatically
  • Sends you an email notification when training completes
  • Updates the model card on your dashboard in real-time

What affects training quality

FactorImpact
Data qualityThe most important factor. Clean, consistent, relevant data produces better models.
Data volumeMore examples generally help, but quality matters more than quantity.
Use-case descriptionGuides how data is structured — a specific description leads to better training data.
Base model choiceDifferent models have different strengths. See Models.

Fine-tuning is iterative. Train a model, test it, identify gaps, add more data, and train again. Each iteration typically improves results.

On this page