Representation and a good similarity measure between Tweets for topic detection

Question

I'm planning to write a tool for Topic Detection on Twitter. I've been thinking about a good similarity measure (distance) between two tweets, and how to represent them, taking in count:

The #hashtags (I think hashtags are very important when detecting topics on Twitter)
The replies (if someone replies to a tweet, those tweets could be talking about the same topic, although two people could start talking about samsung galaxy and end talking about iphone jailbreaking, etc.)

I'm thinking about implementing what I have so far and do some experiments. I'll implement the classic models (like TF*IDF and use the euclidian distance, angle cosine, etc.), and the boolean models with a few similarity measures (Hamming, Jaccard, etc.).

Any ideas of how to adapt some existing model to Twitter or a few ideas about how to create a new one?

Pulkit Goyal · Accepted Answer

Similarity Metrics on Twitter discusses some details about the different similarity measures that you can use for clustering data from twitter together. We did some research on clustering users on twitter based on the user connections, user mentions, geo-location, the content similarity between tweets, content similarity between user descriptions and the common #hashtags.

For finding common topics on twitter, finding connections between the users discussing about the topics really helps and we found that group of users tend to discuss a common topic. There is some detail about this in the second half of this post.

Representation and a good similarity measure between Tweets for topic detection

Tags:

machine-learning

twitter

cluster-analysis

information-retrieval

topic-modeling

Oscar Mederos

1 Answers

Pulkit Goyal

Recent Activity

Donate For Us

Representation and a good similarity measure between Tweets for topic detection

Tags:

machine-learning

twitter

cluster-analysis

information-retrieval

topic-modeling

Oscar Mederos

1 Answers

Pulkit Goyal

Related questions

Recent Activity

Donate For Us