Annotation Converter

Convert NER annotations between IOB, spaCy, Label Studio, and CoNLL formats

What is NER Annotation Conversion?

Named Entity Recognition (NER) is a fundamental NLP task that identifies and classifies named entities in text—such as people, organizations, locations, and dates. Different tools and frameworks use different annotation formats, making it challenging to share datasets or switch between platforms.

This annotation converter transforms NER training data between popular formats: IOB/BIO (the standard token-level format), spaCy's JSON format, Label Studio's output format, and CoNLL. Convert your annotations without manual reformatting.

Supported Annotation Formats

IOB/BIO Format

Token-per-line format with B- (beginning), I- (inside), and O (outside) tags. Standard for most NER research and training.

spaCy Format

JSON with full text and entity spans (start, end, label). Required for training spaCy NER models.

Label Studio Format

JSON with value objects containing start, end, text, and labels. Export format from Label Studio annotation tool.

IOB Tag Meanings

TagMeaningExample
B-PERBeginning of person nameJohn
I-PERInside (continuation) of personSmith (after John)
B-ORGBeginning of organizationGoogle
OOutside (not an entity)works, at, the

Common Entity Types

  • PER — Person names
  • ORG — Organizations
  • LOC — Locations
  • DATE — Dates and times
  • MONEY — Monetary values
  • PRODUCT — Product names

Frequently Asked Questions

What's the difference between IOB and IOB2?

IOB2 (used here) always uses B- for the first token of an entity. Original IOB only uses B- when two entities of the same type are adjacent.

Why use spaCy format over IOB?

spaCy's span-based format preserves the original text and whitespace exactly. IOB tokenization can lose information about spacing and punctuation.

Can I convert Label Studio exports directly?

Yes, paste the annotations array from Label Studio's export and convert to your target format. This tool handles the standard NER task output format.