Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a template file of CRF++?

Tags:

crf

crf++

I'm new to CRF++. I'm teaching myself looking at its manual: http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar#templ

And I don't understand what this means:

This is a template to describe unigram features. When you give a

template "U01:%x[0,1]", CRF++ automatically generates a set of feature

functions (func1 ... funcN) like:

func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0

func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0

func3 = if (output = O and feature="U01:DT") return 1 else return 0

.... funcXX = if (output = B-NP and feature="U01:NN") return 1 else return 0

funcXY = if (output = O and feature="U01:NN") return 1 else return 0. The number of feature functions generated by a template

amounts to (L * N), where L is the number of output

Why are there many lines for the Unigram features and what do they mean?

like image 234
user1610952 Avatar asked Aug 25 '14 01:08

user1610952


1 Answers

After looking at the documentation for long enough, I think I figured it out.

Take the example in the documentation where the input data is:

He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP 
current   JJ   I-NP 
account   NN   I-NP

and the feature template (in the format %x[row, col], where row is relative to your current position) in question is %x[0,1]

When %x[0,1] is expanded, depending on the current token, it could scan one of the strings inside the set [PRP, VBZ, DT, JJ, NN] (i.e. one of the unique strings from the 1st column, where the leftmost column is column 0). For each of these strings it creates a set of feature functions of the form (looking at the 3rd row of input data):

func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O    and feature="U01:DT") return 1 else return 0
...

where that particular string (DT in the code above) is compared with every single output class.

So if the output classes are [B-NP, I-NP, O] the feature template expanded into feature functions will look like:

# row 1 (He, PRP, B-NP)
func1 = if (output = B-NP and feature="U01:PRP") return 1 else return 0
func2 = if (output = I-NP and feature="U01:PRP") return 1 else return 0
func3 = if (output = O    and feature="U01:PRP") return 1 else return 0

# row 2 (Reckons, VBZ, B-VP)
func4 = if (output = B-NP and feature="U01:VBZ") return 1 else return 0
func5 = if (output = I-NP and feature="U01:VBZ") return 1 else return 0
func6 = if (output = O    and feature="U01:VBZ") return 1 else return 0

# Row 3 (the, DT, B-NP)
func7 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func8 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func9 = if (output = O    and feature="U01:DT") return 1 else return 0

# Row 4 (current, JJ, I-NP)
func10 = if (output = B-NP and feature="U01:JJ") return 1 else return 0
func11 = if (output = I-NP and feature="U01:JJ") return 1 else return 0
func12 = if (output = O    and feature="U01:JJ") return 1 else return 0

# Row 5 (account, NN, I-NP)
func13 = if (output = B-NP and feature="U01:NN") return 1 else return 0
func14 = if (output = I-NP and feature="U01:NN") return 1 else return 0
func15 = if (output = O    and feature="U01:NN") return 1 else return 0

Regarding where the documentation mentions:

The number of feature functions generated by a template amounts to (L * N), where L is the number of output classes and N is the number of unique strings expanded from the given template.

In this case L would be 3 and N would be 5.

like image 197
niket Avatar answered Oct 20 '22 14:10

niket