Vector Dimensions Reference
Compare embedding model dimensions, storage requirements, and quality trade-offs
Embedding Models Comparison
| Model | Provider | Dims | Max Tokens | $/1M tokens | Notes |
|---|---|---|---|---|---|
| text-embedding-3-large | OpenAI | 3072 | 8.2K | $0.13 | Best quality, supports dimension reduction |
| text-embedding-3-small | OpenAI | 1536 | 8.2K | $0.02 | Cost-effective, good quality |
| text-embedding-ada-002 | OpenAI | 1536 | 8.2K | $0.10 | Legacy model, still widely used |
| voyage-large-2 | Voyage AI | 1536 | 16.0K | $0.12 | High quality, long context |
| voyage-code-2 | Voyage AI | 1536 | 16.0K | $0.12 | Optimized for code |
| voyage-3 | Voyage AI | 1024 | 32.0K | $0.06 | Latest model, very long context |
| embed-english-v3.0 | Cohere | 1024 | 512 | $0.10 | English optimized |
| embed-multilingual-v3.0 | Cohere | 1024 | 512 | $0.10 | 100+ languages |
| all-MiniLM-L6-v2 | Hugging Face | 384 | 256 | Free | Free, lightweight, fast |
| all-mpnet-base-v2 | Hugging Face | 768 | 384 | Free | Free, good quality |
| bge-large-en-v1.5 | BAAI | 1024 | 512 | Free | Free, competitive quality |
| bge-m3 | BAAI | 1024 | 8.2K | Free | Multi-lingual, multi-functionality |
| e5-large-v2 | Microsoft | 1024 | 512 | Free | Free, strong performance |
| e5-mistral-7b-instruct | Microsoft | 4096 | 32.8K | Free | LLM-based, very high quality |
| jina-embeddings-v2 | Jina AI | 768 | 8.2K | $0.02 | Long context, good quality |
| gecko | 768 | 2.0K | $0.03 | Vertex AI model | |
| textembedding-gecko | 768 | 3.1K | $0.03 | Latest Google embedding |
Storage Calculator
100.0K
Vectors
585.9 MB
Raw Storage
761.7 MB
With Index (~30%)
Formula: vectors × dimensions × 4 bytes (float32). Index overhead varies by database.
Dimension Trade-offs
| Dimensions | Quality | Speed | Storage | Best For |
|---|---|---|---|---|
| 384 | Good | Very Fast | Small | Edge/mobile, high-volume |
| 768 | Better | Fast | Medium | General purpose |
| 1024 | High | Good | Medium-Large | Semantic search |
| 1536 | Very High | Moderate | Large | RAG, precision tasks |
| 3072 | Excellent | Slower | Very Large | Maximum quality |
| 4096 | Excellent | Slowest | Largest | Research, LLM-based |
Related Tools
Attention Mechanism Demo
Interactive visualizer of how self-attention works in transformers
BLEU Score Calculator
Calculate BLEU score for machine translation evaluation
Cosine Similarity Calc
Calculate similarity between two vectors or text embeddings
Embedding 3D Visualizer
Visualize high-dimensional embeddings in 2D/3D using PCA/t-SNE simulation
Perplexity Explainer
Calculate and understand perplexity from probability distributions
ROUGE Score Calculator
Calculate ROUGE-N and ROUGE-L scores for summarization tasks
What Are Vector Dimensions?
When text is converted to embeddings, each piece of text becomes a vector of numbers. The number of values in this vector is called its dimensions. Higher dimensions can capture more nuanced meaning but require more storage and compute.
For example, OpenAI's text-embedding-3-large produces 3072-dimensional vectors, while lightweight models like MiniLM use only 384 dimensions.
Choosing the Right Dimensions
Consider Your Scale
More vectors = more storage. At 1M vectors, 1536 dims = ~6GB while 384 dims = ~1.5GB.
Balance Quality vs Speed
Higher dimensions improve retrieval quality but slow down similarity calculations.
Match Your Use Case
Code search may benefit from specialized models. Multilingual needs may require specific embeddings.
Pro Tip: Start Standard, Optimize Later
1536 dimensions (OpenAI small or Cohere) is a good starting point for most applications. Optimize dimensions later based on actual performance needs and benchmarks.
Frequently Asked Questions
Can I reduce dimensions after embedding?
Yes! OpenAI's text-embedding-3 models support dimension reduction through the API. You can also use PCA or other techniques to reduce existing embeddings, though quality may decrease.
Do more dimensions always mean better quality?
Not necessarily. Quality depends on the model architecture and training. A well-trained 768-dim model can outperform a poorly-trained 1536-dim model. Benchmark on your specific data.
Related Tools
Cosine Similarity
Calculate similarity between vectors.
Embedding Visualizer
Visualize high-dimensional embeddings.
