Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can i find top 10 hashtags from stream of billion tweets

This was an interview question that someone asked me and I didn't really have a good answer. I was wondering if someone could possibly help me understand the solution to this:

"You have a stream of billion tweets coming in. How will you figure out the top 10 hashtags ? "

Thanks

like image 715
brainydexter Avatar asked Jul 05 '12 18:07

brainydexter


People also ask

How do I find out what hashtags are trending on Twitter?

On Twitter's mobile apps, you can find Trends listed under the Trends section of the Explore tab when signed in to twitter.com on a desktop or laptop computer, Trends are listed in many places, including the Home timeline, Notifications, search results, and profile pages.

Is there a website that can count the number of tweets for a specific hashtag?

3 simple steps to count the number of tweets of a hashtag We will just be following these simple 3 steps: Go to https://www.tweetbinder.com/ Enter the hashtag to track in the search box and press “Get Twitter Count“ See the exact tweet count for the last 7 days.

What is the most tweeted hashtag in 24 hours?

#TwitterBestFandom was used as a tool to allow the general public to vote in the 14th Annual Soompi Awards.

How can you tell how popular a hashtag is?

To identify the popularity of hashtags keywords, you simply need to check the number of times a hashtag is used. Instagram and Twitter show you hashtag stats such as the total number of times this was used or the reach impressions.


2 Answers

Create a map, with a hashtag as the key and a counter as a value.

Increment the counter of each tag in each tweet you receive.

Examine the value of the counters to find the top 10.

Your phrasing of the question doesn't include any constraints that would prohibit this straightforward solution. In an interview situation, I would have asked clarifying questions to elicit these constraints.

Under constraints like, "it has to run in linear time," and, "it has to use a constant amount of memory," much more interesting answers emerge.


I am not sure if there is a constant memory solution to the problem as posed, but I know one for a related (and often more useful) problem: identifying elements that constitute a given fraction of results. I gave it as an answer to a similar question.

(I say, "more useful", because if the total fraction of a given item falls below a threshold, it's more likely to be noise than true "Top 10" material.)

like image 173
erickson Avatar answered Oct 20 '22 00:10

erickson


You probably can't analyze all the tweets, so you just analyze a random sample. Find the top 10 from that sample and you can find the top 10 (to some degree of certainty, depending on the sample size and quality of the sample).

I don't think they were looking for an actual solution here, but more probing your thought process on how you might solve a (practically) impossible problem.

like image 29
Eric Petroelje Avatar answered Oct 19 '22 22:10

Eric Petroelje