Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identify prepositons and individual POS

I am trying to find correct parts of speech for each word in paragraph. I am using Stanford POS Tagger. However, I am stuck at a point.

I want to identify prepositions from the paragraph.

Penn Treebank Tagset says that:

IN  Preposition or subordinating conjunction

how, can I be sure if current word is be preposition or subordinating conjunction. How can I extract only prepositions from paragraph in this case?

like image 647
swapyonubuntu Avatar asked Sep 28 '22 20:09

swapyonubuntu


2 Answers

You can't be sure. The reason for this somewhat strange PoS is that it's really hard to automatically determine if, for example, for is a preposition or a subordinate conjunction. So in order for automatic taggers to have a better precision, this distinction is simply ignored. Note that there is also a tag TO, which is given to any occurrence of to, regardless of its function as a preposition, infinitive particle or whatever (I think there are others).

If you need to identify prepositions properly, you need to retrain a tagger with a modified tag set, or maybe train a classifier which takes PoS-tagged text and only does this final disambiguation.

like image 69
lenz Avatar answered Oct 03 '22 07:10

lenz


I have had some breakthrough to understand if the word is actually preposition or subordinating conjunction.

I have parsed following sentence :

She left early because Mike arrived with his new girlfriend.

(here because is subordinating conjunction )

After POS tagging

She_PRP left_VBD early_RB because_IN Mike_NNP arrived_VBD with_IN his_PRP$ new_JJ girlfriend_NN ._.

here , to make sure because is a preposition or not I have parsed the sentence.

Parse Tree for Sentence 1

here because has direct parent after IN as SBAR(Subordinate Clause) as root.

with also comes under IN but its direct parent will be PP so it is a preposition.

Example 2 :

Keep your hand on the wound until the nurse asks you to take it off. (here until is coordinating conjunction )

POS tagging is :

Keep_VB your_PRP$ hand_NN on_IN the_DT wound_NN until_IN the_DT nurse_NN asks_VBZ you_PRP to_TO take_VB it_PRP off_RP ._.

So , until and on are marked as IN.

However, picture gets clearer when we actually parse the sentence.

So finally I conclude because is subordinating conjunction and with is preposition.

Tried for many variations of sentences .. worked for almost all except some cases for before and after. Example 2

like image 44
swapyonubuntu Avatar answered Oct 03 '22 07:10

swapyonubuntu