Latency benchmarking measures how long API calls take to complete. For LLM applications, understanding latency distribution helps you set realistic timeouts, improve user experience, and identify performance issues.

This simulator demonstrates latency profiling concepts. Configure delay ranges to simulate different API behaviors and see how percentiles reveal the full latency picture.

Understanding Percentiles

P50 (Median)

Half of requests complete faster than this. Good for typical user experience.

P95 / P99 (Tail)

Worst-case latencies. Set timeouts based on these to avoid user frustration.

Typical LLM API Latencies

API Type	Typical P50	Typical P95
GPT-4 (short)	500-1000ms	2-5s
GPT-3.5	200-500ms	1-2s
Embeddings	50-200ms	300-500ms

FAQ

Does this call real APIs?

No. This simulates latency with random delays. Use for learning percentile concepts before real benchmarking.

What timeout should I set?

Use P99 or max latency from real benchmarks. Add buffer for network variance. Typical: 30-60s for LLMs.