I'm new to data mining and experimenting a bit.
Let's say I have N twitter users and what I want to find
is the overall theme they're writing about (based on tweets).
Then I want to give higher weight to each theme if that user has higher followers.
Then I want to merge all themes if there're similar enough but still retain the weighting by twitter count.
So basically a list of "important" themes ranked by authority (user's twitter count)
For instance, like news.google.com but ranking would be based on twitter followers that are responsible for theme.
I'd prefer something in python since that's the language I'm most familiar with.
Any ideas?
Thanks
EDIT: Here's a good example of what I'm trying to do (but with diff data) http://www.facebook.com/notes/facebook-data-team/whats-on-your-mind/477517358858
Basically analyzing various data and their correlation to each other: work categories and each persons age or word categories and friend count as in this example.
Where would I begin to solve this and generate such graphs?
Generally speaking : R has some packages specifically directed at text mining and datamining, offering a wide range of techniques. I have no knowledge of that kind of packages in Python, but that doesn't mean they don't exist. I just wouldn't implement it all myself, it's a bit more complicated than it looks at first sight.
Some things you have to consider :
If you have a general idea about this, you can start using the tm package for extracting all the information in a workable format. The package is based on matrices, and metadata objects. These allow you to get weighted frequencies for the different themes, provided you have defined what you consider a theme. You can also use different weighting functions to obtain what you want. The manual is here. But please also visit crossvalidated.com for extra guidance if you're not sure about what you're doing. This is actually more a question about data mining than it is about programming.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With