File Formats
Detailed guidance on each supported file format and how to get the best results.
JSONL (recommended for structured data)
JSONL (JSON Lines) is the gold standard for fine-tuning data. Each line is a separate JSON object, making it easy to represent conversations, Q&A pairs, or structured examples.
{"messages": [{"role": "user", "content": "How do I reset my password?"}, {"role": "assistant", "content": "Go to Settings > Account > Reset Password. You'll receive an email with a reset link."}]}
{"messages": [{"role": "user", "content": "What are your business hours?"}, {"role": "assistant", "content": "We're open Monday through Friday, 9am to 6pm EST."}]}{"question": "What's the return policy?", "answer": "You can return items within 30 days of purchase for a full refund."}
{"question": "Do you ship internationally?", "answer": "Yes, we ship to over 50 countries. Shipping costs vary by destination."}{"instruction": "Summarize this article in 3 bullet points", "input": "The article text goes here...", "output": "• Point one\n• Point two\n• Point three"}
{"instruction": "Translate to French", "input": "Hello, how are you?", "output": "Bonjour, comment allez-vous?"}You don't need to follow any specific schema. Commissioned detects the structure of your JSONL and formats it appropriately for the target model.
JSON
Standard JSON files work for structured data. Commissioned extracts text content from nested objects and arrays.
{
"documents": [
{
"title": "Getting Started Guide",
"content": "Welcome to our platform. Here's how to get started..."
},
{
"title": "API Reference",
"content": "Our API follows RESTful conventions..."
}
]
}PDFs are useful for training on existing documents — research papers, manuals, reports, slide decks.
Commissioned extracts text from PDFs automatically. A few things to know:
- Text-based PDFs work best — the text is extracted directly
- Scanned PDFs (images of text) have lower extraction quality
- Tables and charts — text in tables is extracted; charts and images are skipped
- Headers and footers — repetitive elements are automatically detected and removed
If your PDFs are mostly images or scanned documents, consider converting them to text first for better results.
Plain text (.txt)
The simplest format. Dump any text into a .txt file and upload it.
Good for:
- Email archives (copy-paste from your email client)
- Chat logs (export from Slack, Discord, etc.)
- Writing samples (essays, articles, stories)
- Code (with or without comments)
Markdown (.md)
Markdown files preserve some structure (headings, lists, code blocks) which helps Commissioned understand the hierarchy of your content.
Ideal for:
- Documentation and wikis
- Blog posts and articles
- Technical notes
- README files and guides
Mixing formats
You can upload multiple files in different formats in a single job. For example:
- Upload your documentation as Markdown files
- Add customer support transcripts as a JSONL file
- Include product specs as a PDF
Commissioned processes each file according to its format and combines the results into a unified training dataset.