I'm working on a micro-forum of sorts, whereby a quick (close to tweet-size) topic message is posted by a special user, which subscribers can respond to with like-sized messages of their own. Straightforward, no 'digging' or voting of any sort, just a chronological flow of responses for each topic message. But with high traffic expected.
We would like to flag topic messages according to the response buzz they atract, using a scale of 0 to 10.
Been googling for trend algorithms and open source community application examples for a while, and so far have gleaned two interesting references, which I don't fully grok yet:
Understanding algorithms for measuring trends, a discussion on comparing wikipedia pageviews using the Baseline Trend Algorithm, here on SO.
The Britney Spears Problem, an in-depth article on how to rank search terms, while processing large streams of data.
From the first I understand the need to check the slope in activity, and to balance the weight between two items that differ greatly in scale of activity. But how do I compare many items, growing in number quickly across time? And then, how do I break the items within "buzz grades" from 0 to 10?
The second reference is fascinating, but over my head at this point. From a first pass I've understood the need to keep memory usage stable while keeping counters and storing references to items if necessary. But I haven't figured a fitting algorithm for my specific use case from it, yet.
It's worth noting that I come from a non-computer-science and definitely non-statistics background. Please bear with me :) Any help and code samples (specially in Ruby) would be greatly appreciated.
How are Trends determined? Trends are determined by an algorithm and, by default, are tailored for you based on who you follow, your interests, and your location.
Google Trends Google Trends can help you find trending searches and the most popular searches in a wide range of categories. You can search by region for a specific topic or find the recently trending searches.
Intuition says that a solution to this problem doesn't need a lot of statistics, by ranking the topics based on some simple measures may already provide you with an interesting selection of "trending topics."
One way is to order the topics by number comments generated in the last hour/day/week... and to select the top ones.
Another way is to count the number of comments for each of the topics and divide this by the "age" of the topic. New topics that immediately generate comments will be considered trending, while older topics with many comments will be less trending as they grow older.
These implementations can easily be created in Ruby/Rails and can even be done in an SQL query, provided that the tables contain publish dates and numbers of comments.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With