I am trying to use NLTK toolkit to get extract place, date and time from text messages. I just installed the toolkit on my machine and I wrote this quick snippet to test it out: <pre class="prettyprint"><code>sentence = "Let's meet tomorrow at 9 pm"; tokens = nltk.word_tokenize(sentence) pos_tags = nltk.pos_tag(tokens) print nltk.ne_chunk(pos_tags, binary=True) </code></pre> I was assuming that it will identify the date (tomorrow) and time (9 pm). But, surprisingly it failed to recognize that. I get the following result when I run my above code: <pre class="prettyprint"><code>(S (GPE Let/NNP) 's/POS meet/NN tomorrow/NN at/IN 9/CD pm/NN) </code></pre> Can someone help me understand if I am missing something or NLTK is just not mature enough to tag time and date properly. Thanks!

The default NE chunker in nltk is a maximum entropy chunker trained on the ACE corpus (http://catalog.ldc.upenn.edu/LDC2005T09). It has not been trained to recognise dates and times, so you need to train your own classifier if you want to do that. Have a look at http://mattshomepage.com/articles/2016/May/23/nltk_nec/, the whole process is explained very well. Also, there is a module called timex in nltk_contrib which might help you with your needs. https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/timex.py

Named entity recognition is not an easy problem, do not expect any library to be 100% accurate. You shouldn't make any conclusions about NLTK's performance based on one sentence. Here's another example: <pre class="prettyprint"><code>sentence = "I went to New York to meet John Smith"; </code></pre> I get <pre class="prettyprint"><code>(S I/PRP went/VBD to/TO (NE New/NNP York/NNP) to/TO meet/VB (NE John/NNP Smith/NNP)) </code></pre> As you can see, NLTK does very well here. However, I couldn't get NLTK to recognise <code>today</code> or <code>tomorrow</code> as temporal expressions. You can try Stanford SUTime, it is a part of Stanford CoreNLP- I have used it before I it works quite well (it is in Java though).

NLTK for Named Entity Recognition

Tags:

machine-learning

text-processing

nlp

nltk

named-entity-recognition

I am trying to use NLTK toolkit to get extract place, date and time from text messages. I just installed the toolkit on my machine and I wrote this quick snippet to test it out:

sentence = "Let's meet tomorrow at 9 pm"; tokens = nltk.word_tokenize(sentence) pos_tags = nltk.pos_tag(tokens) print nltk.ne_chunk(pos_tags, binary=True)

I was assuming that it will identify the date (tomorrow) and time (9 pm). But, surprisingly it failed to recognize that. I get the following result when I run my above code:

(S (GPE Let/NNP) 's/POS meet/NN tomorrow/NN at/IN 9/CD pm/NN)

Can someone help me understand if I am missing something or NLTK is just not mature enough to tag time and date properly. Thanks!

895

asked Oct 11 '13 07:10

Darth.Vader

2 Answers

The default NE chunker in nltk is a maximum entropy chunker trained on the ACE corpus (http://catalog.ldc.upenn.edu/LDC2005T09). It has not been trained to recognise dates and times, so you need to train your own classifier if you want to do that.

Have a look at http://mattshomepage.com/articles/2016/May/23/nltk_nec/, the whole process is explained very well.

Also, there is a module called timex in nltk_contrib which might help you with your needs. https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/timex.py

183

answered Sep 19 '22 08:09

Viktor Vojnovski

Named entity recognition is not an easy problem, do not expect any library to be 100% accurate. You shouldn't make any conclusions about NLTK's performance based on one sentence. Here's another example:

sentence = "I went to New York to meet John Smith";

I get

(S   I/PRP   went/VBD   to/TO   (NE New/NNP York/NNP)   to/TO   meet/VB   (NE John/NNP Smith/NNP))

As you can see, NLTK does very well here. However, I couldn't get NLTK to recognise today or tomorrow as temporal expressions. You can try Stanford SUTime, it is a part of Stanford CoreNLP- I have used it before I it works quite well (it is in Java though).

answered Sep 17 '22 08:09

mbatchkarov

Related questions
                            
                                Machine Learning (tensorflow / sklearn) in Django?
                            
                                ValueError: Output tensors to a Model must be the output of a TensorFlow `Layer`
                            
                                In TensorFlow, what is the argument 'axis' in the function 'tf.one_hot'
                            
                                What is the difference between a Bayesian network and a naive Bayes classifier?
                            
                                Using pre-trained word2vec with LSTM for word generation
                            
                                Linear Regression and Gradient Descent in Scikit learn?
                            
                                Impute entire DataFrame (all columns) using Scikit-learn (sklearn) without iterating over columns
                            
                                PCA projection and reconstruction in scikit-learn
                            
                                Meaning of parameters in torch.nn.conv2d
                            
                                Choosing number of Steps per Epoch
                            
                                Xgboost-How to use "mae" as objective function?
                            
                                Train Tensorflow Object Detection on own dataset
                            
                                RuntimeError: Attempting to deserialize object on a CUDA device
                            
                                How to detect how similar a speech recording is to another speech recording?
                            
                                Make predictions using a tensorflow graph from a keras model
                            
                                Why neural network predicts wrong on its own training data?
                            
                                How is Elastic Net used?
                            
                                Error in Confusion Matrix : the data and reference factors must have the same number of levels
                            
                                How does binary cross entropy loss work on autoencoders?
                            
                                How to find the features names of the coefficients using scikit linear regression?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With