Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Horizontal Markovization

I have to implement horizontal markovization (NLP concept) and I'm having a little trouble understanding what the trees will look like. I've been reading the Klein and Manning paper, but they don't explain what the trees with horizontal markovization of order 2 or order 3 will look like. Could someone shed some light on the algorithm and what the trees are SUPPOSED to look like? I'm relatively new to NLP.

like image 860
Josh Bradley Avatar asked Oct 14 '12 16:10

Josh Bradley


1 Answers

So, let's say you have a bunch of flat rules like:

NP 
    NNP
    NNP
    NNP
    NNP

or

VP
   V
   Det
   NP

When you binarize these you want to keep the context (i.e. this isn't just a Det but specifically a Det following a Verb as part of a VP). To do so normally you use annotations like this:

NP 
    NNP
    NP->NNP
        NNP
        NP->NNP->NNP
            NNP
            NP->NNP->NNP->NNP
                NNP

or

VP
   V
   VP->V
       Det
       VP->V->Det
          NP

You need to binarize the tree, but these annotations are not always very meaningful. They might be somewhat meaningful for the Verb Phrase example, but all you really care about for the other one is that a noun phrase can be a fairly long string of proper nouns (e.g. "Peter B. Lewis Building" or "Hope Memorial Bridge Project Anniversary"). So with Horizontal Markovization you will collapse some of the annotations slightly, throwing away some of the context. The order of Markovization is the amount of context you are going to retain. So with the normal annotations you are basically at infinite order: choosing to retain all context and collapse nothing.

Order 0 means you're going to drop all of the context and you get a tree without the fancy annotations, like this:

NP 
    NNP
    NNP
        NNP
        NNP
            NNP
            NNP
                NNP

Order 1 means you'll retain only one term of context and you get a tree like this:

NP 
    NNP
    NP->...NNP  **one term: NP->**
        NNP
        NP->...NNP  **one term: NP->**
            NNP
            NP->...NNP  **one term: NP->**
                NNP

Order 2 means you'll retain two terms of context and you get a tree like this:

NP 
    NNP
    NP->NNP  **two terms: NP->NNP**
        NNP
        NP->NNP->...NNP  **two terms: NP->NNP->**
            NNP
            NP->NNP->...NNP  **two terms: NP->NNP->**
                NNP
like image 158
FoolishSeth Avatar answered Oct 18 '22 01:10

FoolishSeth