Vector Dimensions Reference

Compare embedding model dimensions, storage requirements, and quality trade-offs

Embedding Models Comparison

ModelProviderDimsMax Tokens$/1M tokensNotes
text-embedding-3-largeOpenAI30728.2K$0.13Best quality, supports dimension reduction
text-embedding-3-smallOpenAI15368.2K$0.02Cost-effective, good quality
text-embedding-ada-002OpenAI15368.2K$0.10Legacy model, still widely used
voyage-large-2Voyage AI153616.0K$0.12High quality, long context
voyage-code-2Voyage AI153616.0K$0.12Optimized for code
voyage-3Voyage AI102432.0K$0.06Latest model, very long context
embed-english-v3.0Cohere1024512$0.10English optimized
embed-multilingual-v3.0Cohere1024512$0.10100+ languages
all-MiniLM-L6-v2Hugging Face384256FreeFree, lightweight, fast
all-mpnet-base-v2Hugging Face768384FreeFree, good quality
bge-large-en-v1.5BAAI1024512FreeFree, competitive quality
bge-m3BAAI10248.2KFreeMulti-lingual, multi-functionality
e5-large-v2Microsoft1024512FreeFree, strong performance
e5-mistral-7b-instructMicrosoft409632.8KFreeLLM-based, very high quality
jina-embeddings-v2Jina AI7688.2K$0.02Long context, good quality
geckoGoogle7682.0K$0.03Vertex AI model
textembedding-geckoGoogle7683.1K$0.03Latest Google embedding

Storage Calculator

100.0K

Vectors

585.9 MB

Raw Storage

761.7 MB

With Index (~30%)

Formula: vectors × dimensions × 4 bytes (float32). Index overhead varies by database.

Dimension Trade-offs

DimensionsQualitySpeedStorageBest For
384GoodVery FastSmallEdge/mobile, high-volume
768BetterFastMediumGeneral purpose
1024HighGoodMedium-LargeSemantic search
1536Very HighModerateLargeRAG, precision tasks
3072ExcellentSlowerVery LargeMaximum quality
4096ExcellentSlowestLargestResearch, LLM-based

Related Tools

What Are Vector Dimensions?

When text is converted to embeddings, each piece of text becomes a vector of numbers. The number of values in this vector is called its dimensions. Higher dimensions can capture more nuanced meaning but require more storage and compute.

For example, OpenAI's text-embedding-3-large produces 3072-dimensional vectors, while lightweight models like MiniLM use only 384 dimensions.

Choosing the Right Dimensions

1

Consider Your Scale

More vectors = more storage. At 1M vectors, 1536 dims = ~6GB while 384 dims = ~1.5GB.

2

Balance Quality vs Speed

Higher dimensions improve retrieval quality but slow down similarity calculations.

3

Match Your Use Case

Code search may benefit from specialized models. Multilingual needs may require specific embeddings.

Pro Tip: Start Standard, Optimize Later

1536 dimensions (OpenAI small or Cohere) is a good starting point for most applications. Optimize dimensions later based on actual performance needs and benchmarks.

Frequently Asked Questions

Can I reduce dimensions after embedding?

Yes! OpenAI's text-embedding-3 models support dimension reduction through the API. You can also use PCA or other techniques to reduce existing embeddings, though quality may decrease.

Do more dimensions always mean better quality?

Not necessarily. Quality depends on the model architecture and training. A well-trained 768-dim model can outperform a poorly-trained 1536-dim model. Benchmark on your specific data.

Related Tools

Cosine Similarity

Calculate similarity between vectors.

Embedding Visualizer

Visualize high-dimensional embeddings.