python data mining

Question

I am not too much onto data mining but I require some ideas on clustering. Let me first describe my problem.

I have a around 100 data sheets which contain user reviews. I am trying to find for instances words that describe quality. One can say it is amazing quality another person can say great quality now I have to cluster those documents which describe those similar sentences and get the frequency of such sentences. What concept to apply here?

Guess I have to specify some stop words and synonyms. I am not too familiar with this concept.

Can some one give me some detailed links or explanation? and what tool to be used? I am basically a python programmer so any python module would be appreciated.

Thank You

Andrey Sboev · Accepted Answer

There is http://www.nltk.org/ for language processing. With this library you are able to split text into sentences, calculate term frequences, find synonyms and more.

Carrot^2 is a nice opensource project for clustering text snippets, unfortunately it's written in Java. The idea behind its clustering is terms and phrases (bigrams and trigrams) frequences. After preprocessing each document (snippet, review) is represented as a vector of term/phrase frequences. To calculate clusters they use some linear algebra and find principal components in that terms space. Then this components are used to form clusters and labels for them.

In yuor case it's worth considering reviews as documents, cluster them and get labels for clusters. May be labels would somehow evaluate reviews.

In your specific case it's worth eliminate words of interest so dramatically decreasing dimensionality which is very critical in such tasks

Another useful project - montylingua

python data mining

Tags:

python

nlp

data-mining

Rkz

1 Answers

Andrey Sboev

Recent Activity

Donate For Us

python data mining

Tags:

python

nlp

data-mining

Rkz

1 Answers

Andrey Sboev

Related questions

Recent Activity

Donate For Us