Confusion Matrix
Build and analyze confusion matrices for classification
Confusion Matrix
Related Tools
Evaluation Harness Config
Generate configuration files for LM Evaluation Harness
Human Eval Form
Create grading rubrics and forms for human evaluation of LLM outputs
Latency Benchmark Recorder
Record and visualize latency metrics from your own API tests
Model A/B Test Evaluator
Analyze results from model A/B tests for statistical significance
BLEU & ROUGE Calculator
Calculate standard text generation metrics between reference and hypothesis
What is a Confusion Matrix?
A confusion matrix is a table that visualizes the performance of a classification model. It shows where the model gets confused—predicting one class when the actual class was different—hence the name.
This tool helps you build a confusion matrix interactively and automatically calculates key metrics like accuracy, precision, recall, and F1 score.
Matrix Components
True Positive (TP)
Actually positive, predicted positive. Model correctly identified the positive class.
True Negative (TN)
Actually negative, predicted negative. Model correctly identified the negative class.
Metric Formulas
| Metric | Formula | Meaning |
|---|---|---|
| Accuracy | (TP+TN)/(TP+TN+FP+FN) | Overall correctness |
| Precision | TP/(TP+FP) | Positive prediction quality |
| Recall | TP/(TP+FN) | Positive detection rate |
| F1 | 2*(P*R)/(P+R) | Harmonic mean of P&R |
FAQ
When is accuracy misleading?
With imbalanced classes (e.g., 95% negative), predicting all negative gives 95% accuracy but 0% recall for positives.
Precision vs Recall tradeoff?
High precision = fewer false alarms. High recall = fewer missed cases. F1 balances both. Choose based on whether FP or FN is worse.
