What do the BILOU tags mean in Named Entity Recognition?

Tags:

Title pretty much sums up the question. I've noticed that in some papers people have referred to a BILOU encoding scheme for NER as opposed to the typical BIO tagging scheme (Such as this paper by Ratinov and Roth in 2009 http://cogcomp.cs.illinois.edu/page/publication_view/199)

From working with the 2003 CoNLL data I know that

B stands for 'beginning' (signifies beginning of an NE) I stands for 'inside' (signifies that the word is inside an NE) O stands for 'outside' (signifies that the word is just a regular word outside of an NE)

While I've been told that the words in BILOU stand for

B - 'beginning' I - 'inside' L - 'last' O - 'outside' U - 'unit'

I've also seen people reference another tag

E - 'end', use it concurrently with the 'last' tag S - 'singleton', use it concurrently with the 'unit' tag

I'm pretty new to the NER literature, but I've been unable to find something clearly explaining these tags. My questions in particular relates to what the difference between 'last' and 'end' tags are, and what 'unit' tag stands for.

411

asked Jun 14 '13 20:06

GrantD71

1 Answers

Based on an issue and a patch in Clear TK, it seems like BILOU stands for "Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside" (emphasis added). For instance, the chunking denoted by brackets

(foo foo foo) (bar) no no no (bar bar)

can be encoded with BILOU as

B-foo, I-foo, L-foo, U-bar, O, O, O, B-bar, L-bar

169

answered Sep 30 '22 13:09

mbatchkarov

Related questions
                            
                                Scikit Learn TfidfVectorizer : How to get top n terms with highest tf-idf score
                            
                                Restore original text from Keras’s imdb dataset
                            
                                How to tweak the NLTK sentence tokenizer
                            
                                How to connect Cortana commands to custom scripts?
                            
                                Doc2Vec Get most similar documents
                            
                                Use of PunktSentenceTokenizer in NLTK
                            
                                TFIDF for Large Dataset
                            
                                What are good starting points for someone interested in natural language processing? [closed]
                            
                                How to extract phrases from corpus using gensim
                            
                                How to detect language of user entered text? [closed]
                            
                                Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle form?
                            
                                Machine Learning and Natural Language Processing [closed]
                            
                                Entity Extraction/Recognition with free tools while feeding Lucene Index
                            
                                How to use Gensim doc2vec with pre-trained word vectors?
                            
                                Algorithms to detect phrases and keywords from text
                            
                                Load Pretrained glove vectors in python
                            
                                How to use Bert for long text classification?
                            
                                NLTK Named Entity Recognition with Custom Data
                            
                                Best way to identify and extract dates from text Python?
                            
                                Unsupervised Sentiment Analysis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What do the BILOU tags mean in Named Entity Recognition?

Tags:

nlp

named-entity-recognition

GrantD71

People also ask

1 Answers

mbatchkarov

Recent Activity

Donate For Us