Top-P & Top-K Explainer
Visualize how nucleus sampling (top-p) and top-k filtering affect token selection
Sampling Parameters
Selects tokens until cumulative probability reaches 90%
Selects only the top 40 most likely tokens
Token Probability Distribution
Summary
Tokens in candidate pool:
9
Probability mass covered:
91.0%
Related Tools
Vector Dimension Guide
Reference for default embedding dimensions of popular models (OpenAI, Cohere, etc.)
Attention Mechanism Demo
Interactive visualizer of how self-attention works in transformers
BLEU Score Calculator
Calculate BLEU score for machine translation evaluation
Cosine Similarity Calc
Calculate similarity between two vectors or text embeddings
Embedding 3D Visualizer
Visualize high-dimensional embeddings in 2D/3D using PCA/t-SNE simulation
Perplexity Explainer
Calculate and understand perplexity from probability distributions
What is Top-P (Nucleus Sampling)?
Top-P, also called nucleus sampling, selects tokens from the smallest set whose cumulative probability exceeds P. For example, top-p=0.9 means the model considers only the tokens that together make up 90% of the probability mass.
This adapts dynamically to the distribution — if the model is confident, fewer tokens are considered. If uncertain, more tokens are included.
What is Top-K?
Top-K simply limits selection to the K most probable tokens, regardless of their actual probabilities. For example, top-k=40 means only the 40 most likely tokens are candidates for selection.
This is simpler but less adaptive — it always considers exactly K tokens whether the model is confident or uncertain.
Comparison
| Aspect | Top-P | Top-K |
|---|---|---|
| Basis | Cumulative probability | Token count |
| Adaptive? | Yes - fewer tokens when confident | No - always K tokens |
| Common values | 0.9, 0.95 | 40, 50, 100 |
| Best for | Most use cases | Very large vocabularies |
Pro Tip: Use Top-P Alone
OpenAI recommends adjusting either temperature OR top-p, not both. Top-p=0.9 with temperature=1.0 usually gives good results. Adding top-k on top is rarely necessary.
Frequently Asked Questions
Should I use top-p or top-k?
Top-p is generally preferred because it adapts to the model's confidence. Top-k is useful when you know your vocabulary size and want fixed limits.
What's a good top-p value?
0.9 is a common default. Lower values (0.7-0.8) reduce randomness, while higher values (0.95-1.0) allow more variety.
Can I use both together?
Yes, they can be combined. The model will use whichever filter is more restrictive at each step. However, using one at a time is usually clearer and easier to tune.
Related Tools
Temperature Simulator
See how temperature affects distributions.
Tokenization Visualizer
See how text becomes tokens.
