I have data which is already labelled in SpaCy format. For example:
("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}),
("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]})
But I want to try training it with any other NER model, such as BERT-NER, which requires IOB tagging instead. Is there any conversion code from SpaCy data format to IOB?
Thanks!
This is closely related to and mostly copied from https://stackoverflow.com/a/59209377/461847, see the notes in the comments there, too:
import spacy
from spacy.gold import biluo_tags_from_offsets
TRAIN_DATA = [
("Who is Shaka Khan?", {"entities": [(7, 17, "PERSON")]}),
("I like London and Berlin.", {"entities": [(7, 13, "LOC"), (18, 24, "LOC")]}),
]
nlp = spacy.load('en_core_web_sm')
docs = []
for text, annot in TRAIN_DATA:
doc = nlp(text)
tags = biluo_tags_from_offsets(doc, annot['entities'])
# then convert L->I and U->B to have IOB tags for the tokens in the doc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With