Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Natural language processing, what is the purpose of chunking?

In Natural language processing, what is the purpose of chunking?

like image 565
TIMEX Avatar asked Oct 21 '09 05:10

TIMEX


People also ask

What is an example of chunking?

By grouping each data point into a larger whole, you can improve the amount of information you can remember. Probably the most common example of chunking occurs in phone numbers. For example, a phone number sequence of 4-7-1-1-3-2-4 would be chunked into 471-1324.

How chunking is performed with regular expressions?

Chunking is done with the help of regular expressions. The preceding character can occur zero or more times meaning that the preceding character may or may not be there. ab* matches all inputs starting with ab and then followed by zero or more number of b's. The pattern will match ab, abb ,abbb and so on.

What is chunking in Python?

Chunking is the process of grouping similar words together based on the nature of the word. In the below example we define a grammar by which the chunk must be generated. The grammar suggests the sequence of the phrases like nouns and adjectives etc. which will be followed when creating the chunks.

What is chunking in NLTK?

Chunking is used to categorize different tokens into the same chunk. The result will depend on grammar which has been selected. Further Chunking NLTK is used to tag patterns and to explore text corpora.


2 Answers

Chunking is also called shallow parsing and it's basically the identification of parts of speech and short phrases (like noun phrases). Part of speech tagging tells you whether words are nouns, verbs, adjectives, etc, but it doesn't give you any clue about the structure of the sentence or phrases in the sentence. Sometimes it's useful to have more information than just the parts of speech of words, but you don't need the full parse tree that you would get from parsing.

An example of when chunking might be preferable is Named Entity Recognition. In NER, your goal is to find named entities, which tend to be noun phrases (though aren't always), so you would want to know that President Barack Obama is in the following sentence:

President Barack Obama criticized insurance companies and banks as he urged supporters to pressure Congress to back his moves to revamp the health-care system and overhaul financial regulations. (source)

But you wouldn't necessarily care that he is the subject of the sentence.

Chunking has also been fairly commonly used as a preprocessing step for other tasks like example-based machine translation, natural language understanding, speech generation, and others.

like image 192
ealdent Avatar answered Sep 22 '22 04:09

ealdent


For "text chunking" in natural language processing, see here (you probably want all the lectures in this series as a kind of "NLP 101"...): it spans a series of tasks such as finding noun groups, finding verb groups, and complete partitioning sentence -> chunks of several types. The lecture whose URL I quoted goes into more details!

like image 40
Alex Martelli Avatar answered Sep 22 '22 04:09

Alex Martelli