How can Stanford CoreNLP Named Entity Recognition capture measurements like 5 inches, 5", 5 in., 5 in

Question

I'm looking to capture measurements using Stanford CoreNLP. (If you can suggest a different extractor, that is fine too.)

For example, I want to find 15kg, 15 kg, 15.0 kg, 15 kilogram, 15 lbs, 15 pounds, etc. But among CoreNLPs extraction rules, I don't see one for measurements.

Of course, I can do this with pure regexes, but toolkits can run more quickly, and they offer the opportunity to chunk at a higher level, e.g. to treat gb and gigabytes together, and RAM and memory as building blocks--even without full syntactic parsing--as they build bigger units like 128 gb RAM and 8 gigabytes memory.

I want an extractor for this that is rule-based, not machine-learning-based), but don't see one as part of RegexNer or elsewhere. How do I go about this?

IBM Named Entity Extraction can do this. The regexes are run in an efficient way rather than passing the text through each one. And the regexes are bundled to express meaningful entities, as for example one that unites all the measurement units into a single concept.

Gabor Angeli · Accepted Answer

I don't think a rule-based system exists for this particular task. However, it shouldn't be hard to make with TokensregexNER. For example, a mapping like:

[{ner:NUMBER}]+ /(k|m|g|t)b/ memory?   MEMORY
[{ner:NUMBER}]+ /"|''|in(ches)?/       LENGTH
...

You could try using vanilla TokensRegex as well, and then just extract out the relevant value with a capture group:

(?$group_name [{ner:NUMBER}]+) /(k|m|g|t)b/ memory?

Rohan Amrute · Answer

You can build your own training data and label the required measurements accordingly.

For example if you have a sentence like Jack weighs about 50 kgs

So the model will classify your input as:

Jack, PERSON
weighs, O
about, O
50, MES
kgs, MES

Where MES stands for measurements.

I have recently made training data for the Stanford NER tagger for my customized problem and have built a model for it.

I think for Stanford CoreNLP NER also you can do the same thing

This may be a machine learning-based approach rather than a rule-based approach

How can Stanford CoreNLP Named Entity Recognition capture measurements like 5 inches, 5", 5 in., 5 in

Tags:

nlp

stanford-nlp

named-entity-recognition

named-entity-extraction

Joshua Fox

2 Answers

Gabor Angeli

Rohan Amrute

Recent Activity

Donate For Us

How can Stanford CoreNLP Named Entity Recognition capture measurements like 5 inches, 5", 5 in., 5 in

Tags:

nlp

stanford-nlp

named-entity-recognition

named-entity-extraction

Joshua Fox

2 Answers

Gabor Angeli

Rohan Amrute

Related questions

Recent Activity

Donate For Us