Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Important words from a sentence using Node

Tags:

node.js

nlp

I admit that I havent searched extensively in the SO database. I tried reading the natural npm package but doesnt seem to provide the feature. I would like to know if the below requirement is somewhat possible ?

I have a database that has list of all cities of a country. I also have rating of these cities (best place to live, worst place to live, best rated city, worsrt rated city etc..). Now from the User interface, I would like to enable the user to enter free text and from there I should be able to search my database.

For e.g Best place to live in California or places near California or places in California

From the above sentence, I want to extract the nouns only (may be ) as this will be name of the city or country that I can search for.

Then extract 'best' means I can sort is a particular order etc...

Any suggestions or directions to look for?

I risk a chance that the question will be marked as 'debatable'. But the reason I posted is to get some direction to proceed.

like image 372
Vaya Avatar asked Jan 23 '14 22:01

Vaya


2 Answers

[I came across this question whilst looking for some use cases to test a module I'm working on. Obviously the question is a little old, but since my module addresses the question I thought I might as well add some information here for future searchers.]

You should be able to do what you want with a POS chunker. I've recently released one for Node that is modelled on chunkers provided by the NLTK (Python) and Standford NLP (Java) libraries (the chunk() and TokensRegex() methods, resepectively).

The module processes strings that already contain parts-of-speech, so first you'll need to run your text through a parts-of-speech tagger, such as pos:

var pos = require('pos');

var words = new pos.Lexer().lex('Best place to live in California');
var tags = new pos.Tagger()
  .tag(words)
  .map(function(tag){return tag[0] + '/' + tag[1];})
  .join(' ');

This will give you:

Best/JJS place/NN to/TO live/VB in/IN California/NNP ./.

Now you can use pos-chunker to find all proper nouns:

var chunker = require('pos-chunker');

var places = chunker.chunk(tags, '[{ tag: NNP }]');

This will give you:

Best/JJS place/NN to/TO live/VB in/IN {California/NNP} ./.

Similarly you could extract verbs to understand what people want to do ('live', 'swim', 'eat', etc.):

var verbs = chunker.chunk(tags, '[{ tag: VB }]');

Which would yield:

Best/JJS place/NN to/TO {live/VB} in/IN California/NNP ./.

You can also match words, sequences of words and tags, use lookahead, group sequences together to create chunks (and then match on those), and other such things.

like image 184
Mark Birbeck Avatar answered Nov 08 '22 18:11

Mark Birbeck


You probably don't have to identify what is a noun. Since you already have a list of city and country names that your system can handle, you just have to check whether the user input contains one of these names.

like image 34
Thomas Avatar answered Nov 08 '22 18:11

Thomas