Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word and Text relation using python and NLP

I have a word, according to that i want to find out whether the text is related to that word or not using python and nltk is it possible ?

For example I have a word called "phosphorous". I would like to find out that the particular text file is related to this word or not?

I cant use bag of words in nltk as I have only one word and no training data.

Any Suggestions?

Thanks in Advance.

like image 423
Abhilash Kumar Avatar asked Sep 03 '14 05:09

Abhilash Kumar


People also ask

What is Word2vec in NLP?

Word2vec is a technique for natural language processing published in 2013 by researcher Tomáš Mikolov. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence.

What is text in NLP?

Text mining (also referred to as text analytics) is an artificial intelligence (AI) technology that uses natural language processing (NLP) to transform the free (unstructured) text in documents and databases into normalized, structured data suitable for analysis or to drive machine learning (ML) algorithms.

What is bag-of-words How do you use it in NLP explain in detail?

The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity.


2 Answers

Not without a corpus, no.

Look at it this way: can you, an intelligent being, tell whether 光 is related to 部屋に入った時電気をつけました without asking someone or something that actually knows Japanese (assuming you don't know Japanese; if you do, try with "svjetlo" and "Kad je ušao u sobu, upalio je lampu"). If you can't, how do you expect a computer to do it?

And another experiment - can you, an intelligent being, give me the algorithm by which you can teach a non-english-speaking person that "light" is related to "When he entered the room, he turned on the lamp"? Again, no.

tl;dr: You need training data, unless you significantly restrict the meaning of "related" (to "contains", for example).

like image 144
Amadan Avatar answered Oct 17 '22 04:10

Amadan


You can use the nltk wordnet to calculate path similarity score between the word and words in your other text and estimate a heuristics based on that score:

from nltk.corpus import wordnet as wn hit = wn.synset('hit.v.01') slap = wn.synset('slap.v.01') wn.path_similarity(hit, slap)

You can find more nltk word-net usage examples here: http://www.nltk.org/howto/wordnet.html

like image 2
D Volsky Avatar answered Oct 17 '22 06:10

D Volsky