AI Eval Collection
Evaluate and benchmark AI model performance
Model A/B Test Evaluator
Analyze results from model A/B tests for statistical significance
BLEU & ROUGE Calculator
Calculate standard text generation metrics between reference and hypothesis
Confusion Matrix Visualizer
Generate and analyze confusion matrices for classification models
Evaluation Harness Config
Generate configuration files for LM Evaluation Harness
Human Eval Form
Create grading rubrics and forms for human evaluation of LLM outputs
Latency Benchmark Recorder
Record and visualize latency metrics from your own API tests
Why Use Our AI Tools?
🌐
Free & Online
Use these tools directly in your browser without installation.
🔒
Private
All processing happens locally on your device where possible.
⚡
Efficient
Optimized for speed and productivity.
