Data

Data Overview

How Commissioned handles your data — from upload through training.

Your data is the most important input to a fine-tune. The quality, relevance, and volume of your data directly determines how well your model performs.

What Commissioned does with your data

When you upload files, Commissioned runs an automated pipeline:

Parsing — extracts text from PDFs, parses JSON/JSONL structures, reads plain text and Markdown
Cleaning — removes boilerplate, headers/footers, encoding artifacts, and noise
Deduplication — identifies and removes repeated content that would bias training
Formatting — converts cleaned data into the provider-specific training format (OpenAI, Gemini, or Qwen)
Validation — checks that the result meets the provider's requirements before submitting

You don't need to do any of this yourself. Upload raw files and Commissioned handles the rest.

Your use-case description matters here. It tells Commissioned how to structure your data — whether to treat it as conversation pairs, reference material, stylistic examples, etc.

Supported formats

Format	Extension	Max size	Best for
JSONL	`.jsonl`	5 GB	Pre-structured conversation data
JSON	`.json`	5 GB	Structured data, nested content
PDF	`.pdf`	5 GB	Documents, papers, reports
Plain text	`.txt`	5 GB	Any unstructured text
Markdown	`.md`	5 GB	Documentation, articles, notes

You can upload multiple files in a single job. Mix and match formats as needed.

See File Formats for detailed guidance on each format.

Data privacy

Your data is used only to train your model — it is never shared with other users or used to train other models
Files are encrypted in transit (HTTPS) and at rest
You can delete your models and associated data at any time
See the Terms of Service and Privacy Policy for full details

Key Concepts

Terminology and concepts used throughout the Commissioned platform.

File Formats

Detailed guidance on each supported file format and how to get the best results.

Data Overview

What Commissioned does with your data

Supported formats

Data privacy

On this page