Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

can I use numerical features in crf model

Is it possible/good to add numerical features in crf models? e.g. position in the sequence.

I'm using CRFsuite. It seems all the features will be converted to string, e.g. 'pos=0', 'pos=1', which then lose it's meaning as euclidean distance.

Or should I use them to train another model, e.g. svm, then ensemble with crf models?

like image 399
Lishu Avatar asked Dec 12 '22 03:12

Lishu


2 Answers

I figured out that CRFsuite does handle numerical features, at least according to this documentation:

  • {“string_key”: float_weight, ...} dict where keys are observed features and values are their weights;
  • {“string_key”: bool, ...} dict; True is converted to 1.0 weight, False - to 0.0;
  • {“string_key”: “string_value”, ...} dict; that’s the same as {“string_key=string_value”: 1.0, ...}
  • [“string_key1”, “string_key2”, ...] list; that’s the same as {“string_key1”: 1.0, “string_key2”: 1.0, ...}
  • {“string_prefix”: {...}} dicts: nested dict is processed and “string_prefix” s prepended to each key.
  • {“string_prefix”: [...]} dicts: nested list is processed and “string_prefix” s prepended to each key.
  • {“string_prefix”: set([...])} dicts: nested list is processed and “string_prefix” s prepended to each key.

As long as:

  1. I keep the input properly formatted;
  2. I use float vs string of float;
  3. I normalize it.
like image 118
Lishu Avatar answered Jan 03 '23 03:01

Lishu


CRF itself can use numerical features, and you should use them, but if your implementations converts them to strings (encodes in the binary form by the "one hot spot encoding") then it might be of reduced significance. I suggest to look for more "pure" CRF which allows continuous variables.

A fun fact is that CRF in its core is just structured MaxEnt (LogisticRegression) which works in continuous domain, this string encoding is actually a way to go from categorical values into continuous domain so your problem is actually a result of "overdesigning" of CRFSuite which forgot about actual capabilities of CRF model.

like image 29
lejlot Avatar answered Jan 03 '23 05:01

lejlot