Chunking divides long documents into smaller pieces for embedding and retrieval in RAG (Retrieval-Augmented Generation) systems. The right chunking strategy dramatically impacts retrieval quality—chunks that are too large lose precision, while chunks that are too small lose context.

This calculator lets you preview different chunking strategies on your text before implementing them in your pipeline.

Chunking Methods

Fixed Size

Split at exact token count with configurable overlap. Predictable sizes but may break mid-sentence.

Sentence-based

Groups complete sentences up to target size. Better semantic coherence.

Paragraph-based

Natural document structure. Best for well-formatted content with clear sections.

FAQ

What's a good chunk size?

Start with 256-512 tokens for general text. Smaller (128-256) for Q&A, larger (512-1024) for complex documents.

Why use overlap?

Overlap (10-20% of chunk size) ensures context at boundaries isn't lost. Improves retrieval when the answer spans chunk breaks.