Commissioned
Data

Data Overview

How Commissioned handles your data — from upload through training.

Your data is the most important input to a fine-tune. The quality, relevance, and volume of your data directly determines how well your model performs.

What Commissioned does with your data

When you upload files, Commissioned runs an automated pipeline:

  1. Parsing — extracts text from PDFs, parses JSON/JSONL structures, reads plain text and Markdown
  2. Cleaning — removes boilerplate, headers/footers, encoding artifacts, and noise
  3. Deduplication — identifies and removes repeated content that would bias training
  4. Formatting — converts cleaned data into the provider-specific training format (OpenAI, Gemini, or Qwen)
  5. Validation — checks that the result meets the provider's requirements before submitting

You don't need to do any of this yourself. Upload raw files and Commissioned handles the rest.

Your use-case description matters here. It tells Commissioned how to structure your data — whether to treat it as conversation pairs, reference material, stylistic examples, etc.

Supported formats

FormatExtensionMax sizeBest for
JSONL.jsonl5 GBPre-structured conversation data
JSON.json5 GBStructured data, nested content
PDF.pdf5 GBDocuments, papers, reports
Plain text.txt5 GBAny unstructured text
Markdown.md5 GBDocumentation, articles, notes

You can upload multiple files in a single job. Mix and match formats as needed.

See File Formats for detailed guidance on each format.

Data privacy

  • Your data is used only to train your model — it is never shared with other users or used to train other models
  • Files are encrypted in transit (HTTPS) and at rest
  • You can delete your models and associated data at any time
  • See the Terms of Service and Privacy Policy for full details

On this page