In Natural language processing, what is the purpose of chunking?
By grouping each data point into a larger whole, you can improve the amount of information you can remember. Probably the most common example of chunking occurs in phone numbers. For example, a phone number sequence of 4-7-1-1-3-2-4 would be chunked into 471-1324.
Chunking is done with the help of regular expressions. The preceding character can occur zero or more times meaning that the preceding character may or may not be there. ab* matches all inputs starting with ab and then followed by zero or more number of b's. The pattern will match ab, abb ,abbb and so on.
Chunking is the process of grouping similar words together based on the nature of the word. In the below example we define a grammar by which the chunk must be generated. The grammar suggests the sequence of the phrases like nouns and adjectives etc. which will be followed when creating the chunks.
Chunking is used to categorize different tokens into the same chunk. The result will depend on grammar which has been selected. Further Chunking NLTK is used to tag patterns and to explore text corpora.
Chunking is also called shallow parsing and it's basically the identification of parts of speech and short phrases (like noun phrases). Part of speech tagging tells you whether words are nouns, verbs, adjectives, etc, but it doesn't give you any clue about the structure of the sentence or phrases in the sentence. Sometimes it's useful to have more information than just the parts of speech of words, but you don't need the full parse tree that you would get from parsing.
An example of when chunking might be preferable is Named Entity Recognition. In NER, your goal is to find named entities, which tend to be noun phrases (though aren't always), so you would want to know that President Barack Obama is in the following sentence:
President Barack Obama criticized insurance companies and banks as he urged supporters to pressure Congress to back his moves to revamp the health-care system and overhaul financial regulations. (source)
But you wouldn't necessarily care that he is the subject of the sentence.
Chunking has also been fairly commonly used as a preprocessing step for other tasks like example-based machine translation, natural language understanding, speech generation, and others.
For "text chunking" in natural language processing, see here (you probably want all the lectures in this series as a kind of "NLP 101"...): it spans a series of tasks such as finding noun groups, finding verb groups, and complete partitioning sentence -> chunks of several types. The lecture whose URL I quoted goes into more details!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With