Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do the BILOU tags mean in Named Entity Recognition?

Title pretty much sums up the question. I've noticed that in some papers people have referred to a BILOU encoding scheme for NER as opposed to the typical BIO tagging scheme (Such as this paper by Ratinov and Roth in 2009 http://cogcomp.cs.illinois.edu/page/publication_view/199)

From working with the 2003 CoNLL data I know that

B stands for 'beginning' (signifies beginning of an NE) I stands for 'inside' (signifies that the word is inside an NE) O stands for 'outside' (signifies that the word is just a regular word outside of an NE) 

While I've been told that the words in BILOU stand for

B - 'beginning' I - 'inside' L - 'last' O - 'outside' U - 'unit' 

I've also seen people reference another tag

E - 'end', use it concurrently with the 'last' tag S - 'singleton', use it concurrently with the 'unit' tag 

I'm pretty new to the NER literature, but I've been unable to find something clearly explaining these tags. My questions in particular relates to what the difference between 'last' and 'end' tags are, and what 'unit' tag stands for.

like image 411
GrantD71 Avatar asked Jun 14 '13 20:06

GrantD71


People also ask

What is a ner tag?

We explore the problem of Named Entity Recognition (NER) tagging of sentences. The task is to tag each token in a given sentence with an appropriate tag such as Person, Location, etc. John lives in New York B-PER O O B-LOC I-LOC. Our dataset will thus need to load both the sentences and labels.

How is named entity recognition done?

Named Entity Recognition is one of the key entity detection methods in NLP. 2. Named entity recognition is a natural language processing technique that can automatically scan entire articles and pull out some fundamental entities in a text and classify them into predefined categories.

Why is named entity recognition difficult?

Ambiguity and Abbreviations -One of the major challenges in identifying named entities is language. Recognizing words which can have multiple meanings or words that can be a part of different sentences. Another major challenge is classifying similar words from texts.

What is named entity recognition (NER)?

The named entity recognition (NER) is one of the most data preprocessing task. It involves the identification of key information in the text and classification into a set of predefined categories. An entity is basically the thing that is consistently talked about or refer to in the text. NER is the form of NLP.

What is the difference between I and U Tags in Bilou?

In BILOU, the last I tag in a particular I "cluster" would be converted to L . Eg. In BILOU, any standalone tag is converted to a U tag. Eg.

What does Bilou stand for?

Based on an issue and a patch in Clear TK, it seems like BILOU stands for "Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside" (emphasis added). For instance, the chunking denoted by brackets

What is the difference between BIO and Bilou notations?

In BILOU, the last I tag in a particular I "cluster" would be converted to L . Eg. In BILOU, any standalone tag is converted to a U tag. Eg. Following is a set of same tags represented in both BIO and BILOU notations:


1 Answers

Based on an issue and a patch in Clear TK, it seems like BILOU stands for "Beginning, Inside and Last tokens of multi-token chunks, Unit-length chunks and Outside" (emphasis added). For instance, the chunking denoted by brackets

(foo foo foo) (bar) no no no (bar bar) 

can be encoded with BILOU as

B-foo, I-foo, L-foo, U-bar, O, O, O, B-bar, L-bar 
like image 169
mbatchkarov Avatar answered Sep 30 '22 13:09

mbatchkarov