I'm looking to capture measurements using Stanford CoreNLP. (If you can suggest a different extractor, that is fine too.)
For example, I want to find 15kg, 15 kg, 15.0 kg, 15 kilogram, 15 lbs, 15 pounds, etc. But among CoreNLPs extraction rules, I don't see one for measurements.
Of course, I can do this with pure regexes, but toolkits can run more quickly, and they offer the opportunity to chunk at a higher level, e.g. to treat gb and gigabytes together, and RAM and memory as building blocks--even without full syntactic parsing--as they build bigger units like 128 gb RAM and 8 gigabytes memory.
I want an extractor for this that is rule-based, not machine-learning-based), but don't see one as part of RegexNer or elsewhere. How do I go about this?
IBM Named Entity Extraction can do this. The regexes are run in an efficient way rather than passing the text through each one. And the regexes are bundled to express meaningful entities, as for example one that unites all the measurement units into a single concept.
I don't think a rule-based system exists for this particular task. However, it shouldn't be hard to make with TokensregexNER. For example, a mapping like:
[{ner:NUMBER}]+ /(k|m|g|t)b/ memory? MEMORY
[{ner:NUMBER}]+ /"|''|in(ches)?/ LENGTH
...
You could try using vanilla TokensRegex as well, and then just extract out the relevant value with a capture group:
(?$group_name [{ner:NUMBER}]+) /(k|m|g|t)b/ memory?
You can build your own training data and label the required measurements accordingly.
For example if you have a sentence like Jack weighs about 50 kgs
So the model will classify your input as:
Jack, PERSON
weighs, O
about, O
50, MES
kgs, MES
Where MES
stands for measurements.
I have recently made training data for the Stanford NER tagger
for my customized problem and have built a model for it.
I think for Stanford CoreNLP NER
also you can do the same thing
This may be a machine learning-based
approach rather than a rule-based
approach
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With