Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining the context of a word - Python

I think this is an interesting question, at least for me.


I have a list of words, let's say:

photo, free, search, image, css3, css, tutorials, webdesign, tutorial, google, china, censorship, politics, internet

and I have a list of contexts:

  • Programming
  • World news
  • Technology
  • Web Design

I need to try and match words with the appropriate context/contexts if possible.

Maybe discovering word relationships in some way.

alt text


Any ideas?

Help would be much appreciated!

like image 902
RadiantHex Avatar asked Mar 23 '10 14:03

RadiantHex


2 Answers

This sounds like it's more of a categorization/ontology problem than NLP. Try WordNet for a standard ontology.

I don't see any real NLP in your stated problem, but if you do need some semantic analysis or a parser try NLTK.

like image 72
adam Avatar answered Sep 21 '22 02:09

adam


Where do these words come from? Do they come from real texts. If they are then it is a classic data mining problem. What you need to do is to your set of documents into the matrix where rows represent which document the word came from and the columns represent the words in the documents.

For example if you have two documents like this:

D1: Need to find meaning. D2: Need to separate Apples from oranges

you matrix will look like this:

      Need to find meaning Apples Oranges Separate From
D1:   1     1   1     1      0      0       0       0
D2:   1     1   0     0      1      1       1       1

This is called term by document matrix

Having collected this statistics you can use algorithms like K-Means to group similar documents together. Since you already know how many concepts you have your tasks should be soomewhat easier. K-Means is very slow algorithm, so you can try to optimize it using techniques such as SVD

like image 23
Vlad Avatar answered Sep 22 '22 02:09

Vlad