What Is the Difference Between POS Tagging and Shallow Parsing?

Tags:

I'm currently taking a Natural Language Processing course at my University and still confused with some basic concept. I get the definition of POS Tagging from the Foundations of Statistical Natural Language Processing book:

Tagging is the task of labeling (or tagging) each word in a sentence with its appropriate part of speech. We decide whether each word is a noun, verb, adjective, or whatever.

But I can't find a definition of Shallow Parsing in the book since it also describe shallow parsing as one of the utilities of POS Tagging. So I began to search the web and found no direct explanation of shallow parsing, but in Wikipedia:

Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which identifies the constituents (noun groups, verbs, verb groups, etc.), but does not specify their internal structure, nor their role in the main sentence.

I frankly don't see the difference, but it may be because of my English or just me not understanding simple basic concept. Can anyone please explain the difference between shallow parsing and POS Tagging? Is shallow parsing often also called Shallow Semantic Parsing?

Thanks before.

374

asked Jan 25 '12 07:01

bertzzie

3 Answers

POS tagging would give a POS tag to each and every word in the input sentence.

Parsing the sentence (using the stanford pcfg for example) would convert the sentence into a tree whose leaves will hold POS tags (which correspond to words in the sentence), but the rest of the tree would tell you how exactly these these words are joining together to make the overall sentence. For example an adjective and a noun might combine to be a 'Noun Phrase', which might combine with another adjective to form another Noun Phrase (e.g. quick brown fox) (the exact way the pieces combine depends on the parser in question).
You can see how parser output looks like at http://nlp.stanford.edu:8080/parser/index.jsp

A shallow parser or 'chunker' comes somewhere in between these two. A plain POS tagger is really fast but does not give you enough information and a full blown parser is slow and gives you too much. A POS tagger can be thought of as a parser which only returns the bottom-most tier of the parse tree to you. A chunker might be thought of as a parser that returns some other tier of the parse tree to you instead. Sometimes you just need to know that a bunch of words together form a Noun Phrase but don't care about the sub-structure of the tree within those words (i.e. which words are adjectives, determiners, nouns, etc and how do they combine). In such cases you can use a chunker to get exactly the information you need instead of wasting time generating the full parse tree for the sentence.

195

answered Oct 10 '22 20:10

Aditya Mukherji

POS tagging is a process deciding what is the type of every token from a text, e.g. NOUN, VERB, DETERMINER, etc. Token can be word or punctuation.
Meanwhile shallow parsing or chunking is a process dividing a text into syntactically related group.

Pos Tagging output

My/PRP$ dog/NN likes/VBZ his/PRP$ food/NN ./.

Chunking output

[NP My Dog] [VP likes] [NP his food]

answered Oct 10 '22 21:10

Khairul

The Constraint Grammar framework is illustrative. In its simplest, crudest form, it takes as input POS-tagged text, and adds what you could call Part of Clause tags. For an adjective, for example, it could add @NN> to indicate that it is part of an NP whose head word is to the right.

answered Oct 10 '22 21:10

tripleee

Related questions
                            
                                How to extract phrases from corpus using gensim
                            
                                How to detect language of user entered text? [closed]
                            
                                Using NLTK and WordNet; how do I convert simple tense verb into its present, past or past participle form?
                            
                                Machine Learning and Natural Language Processing [closed]
                            
                                Entity Extraction/Recognition with free tools while feeding Lucene Index
                            
                                How to use Gensim doc2vec with pre-trained word vectors?
                            
                                Algorithms to detect phrases and keywords from text
                            
                                Load Pretrained glove vectors in python
                            
                                How to use Bert for long text classification?
                            
                                NLTK Named Entity Recognition with Custom Data
                            
                                Best way to identify and extract dates from text Python?
                            
                                Unsupervised Sentiment Analysis
                            
                                What do the BILOU tags mean in Named Entity Recognition?
                            
                                Text Summarization Evaluation - BLEU vs ROUGE
                            
                                gensim word2vec: Find number of words in vocabulary
                            
                                Improving the extraction of human names with nltk [closed]
                            
                                SpaCy OSError: Can't find model 'en'
                            
                                What is a projection layer in the context of neural networks?
                            
                                tag generation from a text content
                            
                                How to read values from numbers written as words?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With