Training Data Formatter
Convert data to JSONL format for LLM fine-tuning
Format Configuration
Output (0 lines)
Related Tools
Annotation Converter
Convert between different data annotation formats (COCO, YOLO, Pascal VOC)
Data Augmentation Preview
Visualize image augmentation techniques for training data
Chat Data Formatter
Convert chat logs between ShareGPT, OpenAI, and Alpaca formats
Dataset Splitter
Split datasets into train, validation, and test sets with stratification
JSONL Converter
Convert between JSON and JSONL formats for fine-tuning comparisons
PII Detector
Identify and redact Personally Identifiable Information in datasets client-side
What is Training Data Formatting?
LLM fine-tuning requires data in specific formats. OpenAI, Anthropic, and other providers each have format requirements. This formatter converts your CSV, JSON, or plain text data into the JSONL formats needed for fine-tuning.
Whether you have spreadsheet data, JSON exports, or raw text files, this tool transforms them into fine-tuning-ready JSONL with automatic field mapping.
Output Format Comparison
Legacy JSONL
Simple prompt/completion pairs. Used by older OpenAI fine-tuning and some open-source tools.
{"prompt": "...", "completion": "..."}Chat JSONL (OpenAI)
Messages array format required by OpenAI's current fine-tuning API. Supports multi-turn conversations.
{"messages": [{"role": "user"...}, {"role": "assistant"...}]}Supported Input Fields
| Input Field | Maps To |
|---|---|
| prompt, input, question | User message |
| completion, output, answer, response | Assistant message |
Best Practices
- Validate JSON: Ensure each line is valid JSON before uploading to fine-tuning APIs.
- Consistent formatting: Use the same structure for all examples in your dataset.
- Add system prompts: For chat format, consider adding system messages separately.
- Check token counts: Verify examples don't exceed the model's context limit.
FAQ
Which format should I use for OpenAI?
Use Chat JSONL for all current OpenAI fine-tuning (gpt-3.5-turbo, gpt-4). Legacy format is deprecated.
How do I add system prompts?
This tool creates user/assistant pairs. Add system messages manually to the output, or include a "system" field in your input JSON.
