I'm new to CRF++. I'm teaching myself looking at its manual: http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar#templ
And I don't understand what this means:
This is a template to describe unigram features. When you give a
template "U01:%x[0,1]", CRF++ automatically generates a set of feature
functions (func1 ... funcN) like:
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
.... funcXX = if (output = B-NP and feature="U01:NN") return 1 else return 0
funcXY = if (output = O and feature="U01:NN") return 1 else return 0. The number of feature functions generated by a template
amounts to (L * N), where L is the number of output
Why are there many lines for the Unigram features and what do they mean?
After looking at the documentation for long enough, I think I figured it out.
Take the example in the documentation where the input data is:
He PRP B-NP
reckons VBZ B-VP
the DT B-NP
current JJ I-NP
account NN I-NP
and the feature template (in the format %x[row, col]
, where row
is relative to your current position) in question is %x[0,1]
When %x[0,1]
is expanded, depending on the current token, it could scan one of the strings inside the set [PRP, VBZ, DT, JJ, NN]
(i.e. one of the unique strings from the 1st column, where the leftmost column is column 0). For each of these strings it creates a set of feature functions of the form (looking at the 3rd row of input data):
func1 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func2 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func3 = if (output = O and feature="U01:DT") return 1 else return 0
...
where that particular string (DT
in the code above) is compared with every single output class.
So if the output classes are [B-NP, I-NP, O]
the feature template expanded into feature functions will look like:
# row 1 (He, PRP, B-NP)
func1 = if (output = B-NP and feature="U01:PRP") return 1 else return 0
func2 = if (output = I-NP and feature="U01:PRP") return 1 else return 0
func3 = if (output = O and feature="U01:PRP") return 1 else return 0
# row 2 (Reckons, VBZ, B-VP)
func4 = if (output = B-NP and feature="U01:VBZ") return 1 else return 0
func5 = if (output = I-NP and feature="U01:VBZ") return 1 else return 0
func6 = if (output = O and feature="U01:VBZ") return 1 else return 0
# Row 3 (the, DT, B-NP)
func7 = if (output = B-NP and feature="U01:DT") return 1 else return 0
func8 = if (output = I-NP and feature="U01:DT") return 1 else return 0
func9 = if (output = O and feature="U01:DT") return 1 else return 0
# Row 4 (current, JJ, I-NP)
func10 = if (output = B-NP and feature="U01:JJ") return 1 else return 0
func11 = if (output = I-NP and feature="U01:JJ") return 1 else return 0
func12 = if (output = O and feature="U01:JJ") return 1 else return 0
# Row 5 (account, NN, I-NP)
func13 = if (output = B-NP and feature="U01:NN") return 1 else return 0
func14 = if (output = I-NP and feature="U01:NN") return 1 else return 0
func15 = if (output = O and feature="U01:NN") return 1 else return 0
Regarding where the documentation mentions:
The number of feature functions generated by a template amounts to (L * N), where L is the number of output classes and N is the number of unique strings expanded from the given template.
In this case L would be 3 and N would be 5.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With