Top-P & Top-K Explainer

Visualize how nucleus sampling (top-p) and top-k filtering affect token selection

Sampling Parameters

Top-P (Nucleus) 0.90

Selects tokens until cumulative probability reaches 90%

Top-K 40

Selects only the top 40 most likely tokens

Token Probability Distribution

Included in sampling Excluded

the

25.0% Σ25%

18.0% Σ43%

12.0% Σ55%

this

10.0% Σ65%

that

8.0% Σ73%

6.0% Σ79%

your

5.0% Σ84%

his

4.0% Σ88%

her

3.0% Σ91%

← top-p cutoff

their

2.5% Σ94%

our

2.0% Σ96%

its

1.5% Σ97%

some

1.0% Σ98%

any

0.8% Σ99%

each

0.6% Σ99%

every

0.5% Σ100%

one

0.4% Σ100%

two

0.3% Σ101%

three

0.2% Σ101%

many

0.1% Σ101%

Summary

Tokens in candidate pool:

Probability mass covered:

91.0%

Related Tools

Vector Dimension Guide

Reference for default embedding dimensions of popular models (OpenAI, Cohere, etc.)

Attention Mechanism Demo

Interactive visualizer of how self-attention works in transformers

BLEU Score Calculator

Calculate BLEU score for machine translation evaluation

Cosine Similarity Calc

Calculate similarity between two vectors or text embeddings

Embedding 3D Visualizer

Visualize high-dimensional embeddings in 2D/3D using PCA/t-SNE simulation

Perplexity Explainer

Calculate and understand perplexity from probability distributions

What is Top-P (Nucleus Sampling)?

Top-P, also called nucleus sampling, selects tokens from the smallest set whose cumulative probability exceeds P. For example, top-p=0.9 means the model considers only the tokens that together make up 90% of the probability mass.

This adapts dynamically to the distribution — if the model is confident, fewer tokens are considered. If uncertain, more tokens are included.

What is Top-K?

Top-K simply limits selection to the K most probable tokens, regardless of their actual probabilities. For example, top-k=40 means only the 40 most likely tokens are candidates for selection.

This is simpler but less adaptive — it always considers exactly K tokens whether the model is confident or uncertain.

Comparison

Aspect	Top-P	Top-K
Basis	Cumulative probability	Token count
Adaptive?	Yes - fewer tokens when confident	No - always K tokens
Common values	0.9, 0.95	40, 50, 100
Best for	Most use cases	Very large vocabularies

Pro Tip: Use Top-P Alone

OpenAI recommends adjusting either temperature OR top-p, not both. Top-p=0.9 with temperature=1.0 usually gives good results. Adding top-k on top is rarely necessary.

Frequently Asked Questions

Should I use top-p or top-k?

Top-p is generally preferred because it adapts to the model's confidence. Top-k is useful when you know your vocabulary size and want fixed limits.

What's a good top-p value?

0.9 is a common default. Lower values (0.7-0.8) reduce randomness, while higher values (0.95-1.0) allow more variety.

Can I use both together?

Yes, they can be combined. The model will use whichever filter is more restrictive at each step. However, using one at a time is usually clearer and easier to tune.

Top-P & Top-K Explainer

Sampling Parameters

Token Probability Distribution

Summary

Related Tools

Vector Dimension Guide

Attention Mechanism Demo

BLEU Score Calculator

Cosine Similarity Calc

Embedding 3D Visualizer

Perplexity Explainer

What is Top-P (Nucleus Sampling)?

What is Top-K?

Comparison

Pro Tip: Use Top-P Alone

Frequently Asked Questions

Related Tools

Temperature Simulator

Tokenization Visualizer