Is it possible/good to add numerical features in crf models? e.g. position in the sequence.
I'm using CRFsuite. It seems all the features will be converted to string, e.g. 'pos=0', 'pos=1', which then lose it's meaning as euclidean distance.
Or should I use them to train another model, e.g. svm, then ensemble with crf models?
I figured out that CRFsuite does handle numerical features, at least according to this documentation:
- {“string_key”: float_weight, ...} dict where keys are observed features and values are their weights;
- {“string_key”: bool, ...} dict; True is converted to 1.0 weight, False - to 0.0;
- {“string_key”: “string_value”, ...} dict; that’s the same as {“string_key=string_value”: 1.0, ...}
- [“string_key1”, “string_key2”, ...] list; that’s the same as {“string_key1”: 1.0, “string_key2”: 1.0, ...}
- {“string_prefix”: {...}} dicts: nested dict is processed and “string_prefix” s prepended to each key.
- {“string_prefix”: [...]} dicts: nested list is processed and “string_prefix” s prepended to each key.
- {“string_prefix”: set([...])} dicts: nested list is processed and “string_prefix” s prepended to each key.
As long as:
CRF itself can use numerical features, and you should use them, but if your implementations converts them to strings (encodes in the binary form by the "one hot spot encoding") then it might be of reduced significance. I suggest to look for more "pure" CRF which allows continuous variables.
A fun fact is that CRF in its core is just structured MaxEnt (LogisticRegression) which works in continuous domain, this string encoding is actually a way to go from categorical values into continuous domain so your problem is actually a result of "overdesigning" of CRFSuite which forgot about actual capabilities of CRF model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With