BLEU Score Calculator

Calculate BLEU (Bilingual Evaluation Understudy) scores for translation and text generation

Input Texts

6 tokens

7 tokens

BLEU Score

0.1

Very Poor

1-gram

57.1%

2-gram

33.3%

3-gram

20.0%

4-gram

0.0%

Calculation Details

P1: 4/7 = 57.1%

P2: 2/6 = 33.3%

P3: 1/5 = 20.0%

P4: 0/4 = 0.0%

BP: 1.0000 (hyp=7, ref=6)

BLEU = 1.0000 × 0.0014 = 0.0014

Related Tools

What is BLEU Score?

BLEU (Bilingual Evaluation Understudy) is a metric for evaluating machine translation quality. It measures how similar a generated text is to a reference text by comparing n-gram overlaps.

BLEU scores range from 0 to 1 (or 0 to 100 when expressed as percentages), where higher scores indicate better matches with the reference.

How BLEU Works

1

N-gram Precision

Count matching n-grams (1-4 words) between hypothesis and reference.

2

Clipped Counts

Clip n-gram counts to prevent gaming by repeating words.

3

Brevity Penalty

Penalize translations that are too short compared to reference.

4

Geometric Mean

Combine precisions using geometric mean for final score.

Score Interpretation

ScoreQualityInterpretation
>60ExcellentHigh quality, near-human translation
40-60GoodUnderstandable, mostly accurate
20-40FairGist is preserved, rough translation
<20PoorLow similarity to reference

Limitations of BLEU

BLEU only measures surface-level n-gram overlap. It doesn't capture meaning, fluency, or semantic equivalence. A paraphrase may have low BLEU despite being a good translation.

Related Tools

ROUGE Score

Calculate ROUGE for summarization evaluation.

Perplexity Calculator

Calculate perplexity for language models.