Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Text processing in Java

Tags:

java

opennlp

Now this is a tricky problem for which I'm not able to figure out a good solution. Suppose we have a String in Java:- "He ate 3 apples today." Now the digit 3 can be easily identified in Java using isNumeric function or using regular expressions. But what if I have a String like: "He ate three apples today."? How can I identify that three is actually a number? I used OpenNlp and used its POS tagger but the time it takes to do is really too much! Can anyone suggest a better solution for this? Also among the ".bin" of OpenNlp, there is one file-"num.bin", but I don't know how to use this file. OpenNlp documentation also say nothing about it. Can anyone tell me if this is exactly what I've been looking for, and if yes then how to use it.

/*********************************************************************************************************************************/ I'm actually short of time here, so I've settled on a temporary solution here. Make a file/dictionary and take all the entries in a hashtable. Then I'll tokenize my sentence and check word by word for numbers, similar to what you guys suggested. I'll keep on updating the file as and when required. Thanks for your valuable suggestions guys, and if you have got something better than this I'd be really glad. OpenNlp implements this in a very good way, the only problem with it is time complexity and I want to do this in minimum time possible.

like image 975
Manan Pancholi Avatar asked Nov 14 '22 11:11

Manan Pancholi


1 Answers

Create a dictionary of numbers. Search for elements from that dictionary in the text.

Check asympotic complexity, it may be cheaper to sort the text first.

like image 68
A T Avatar answered Nov 16 '22 04:11

A T