Logo Questions Linux Laravel Mysql Ubuntu Git Menu

extracting hashtags out of Twitter trending topics data with Python Tweepy

I'm having a following problem:

using the Twitter API and tweepy module, I want to monitor the trending topics and extract hashtags out of the data.

This code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import tweepy, json

ACCESS_KEY = 'key'
ACCESS_SECRET = 'secret'
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)

trends1 = api.trends_place(1)
print trends1

gives me data about globally trending topics that is structured like this:

[{u'created_at': u'2014-04-16T12:13:15Z', u'trends': [{u'url': u'http://twitter.com/search?q=%22South+Korea%22', u'query': u'%22South+Korea%22', u'name': u'South Korea', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23FETUSONEDIRECTIONDAY', u'query': u'%23FETUSONEDIRECTIONDAY', u'name': u'#FETUSONEDIRECTIONDAY', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23PrayForSouthKorea', u'query': u'%23PrayForSouthKorea', u'name': u'#PrayForSouthKorea', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23GaraGaraRP', u'query': u'%23GaraGaraRP', u'name': u'#GaraGaraRP', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A', u'query': u'%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A', u'name': u'#\u0625\u0633\u0645_\u0623\u0645\u064a_\u0628\u062c\u0648\u0627\u0644\u064a', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa', u'query': u'%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa', u'name': u'#Kad\u0131nlarKamyon\u015eof\xf6r\xfcOlursa', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%22Dear+My+BestFriend%22', u'query': u'%22Dear+My+BestFriend%22', u'name': u'Dear My BestFriend', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%22%D0%A1%D0%B0%D0%BC%D0%BE%D0%BE%D0%B1%D0%BE%D1%80%D0%BE%D0%BD%D0%B0+100%22', u'query': u'%22%D0%A1%D0%B0%D0%BC%D0%BE%D0%BE%D0%B1%D0%BE%D1%80%D0%BE%D0%BD%D0%B0+100%22', u'name': u'\u0421\u0430\u043c\u043e\u043e\u0431\u043e\u0440\u043e\u043d\u0430 100', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%22If+I+Stay%22', u'query': u'%22If+I+Stay%22', u'name': u'If I Stay', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=Gabashvili', u'query': u'Gabashvili', u'name': u'Gabashvili', u'promoted_content': None}], u'as_of': u'2014-04-16T12:20:29Z', u'locations': [{u'woeid': 1, u'name': u'Worldwide'}]}]

Is this a python list, containing several dictionaries? How can I extract hashtags out of that data and save them into new variables?

I'm new to python so please explain your choices.


like image 680
bcrvc Avatar asked Apr 16 '14 12:04


People also ask

How do I extract data from Twitter using Tweepy?

Steps to obtain keys: – For access token, click ” Create my access token”. The page will refresh and generate access token. Tweepy is one of the library that should be installed using pip. Now in order to authorize our app to access Twitter on our behalf, we need to use the OAuth Interface.

How do I see what hashtags are trending on Twitter?

On Twitter's mobile apps, you can find Trends listed under the Trends section of the Explore tab when signed in to twitter.com on a desktop or laptop computer, Trends are listed in many places, including the Home timeline, Notifications, search results, and profile pages.

2 Answers

In your example you have a single entry in your list, consisting of nested dicts with key value 'trends' each value is a another dict, the one you are interested in is 'name' and in particular if it starts with '#':

In [180]:

[x for x in temp[0]['trends'] if x['name'].find('#') ==0]
  'promoted_content': None,
  'url': 'http://twitter.com/search?q=%23FETUSONEDIRECTIONDAY'},
 {'name': '#PrayForSouthKorea',
  'promoted_content': None,
  'query': '%23PrayForSouthKorea',
  'url': 'http://twitter.com/search?q=%23PrayForSouthKorea'},
 {'name': '#GaraGaraRP',
  'promoted_content': None,
  'query': '%23GaraGaraRP',
  'url': 'http://twitter.com/search?q=%23GaraGaraRP'},
 {'name': '#إسم_أمي_بجوالي',
  'promoted_content': None,
  'query': '%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A',
  'url': 'http://twitter.com/search?q=%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A'},
 {'name': '#KadınlarKamyonŞoförüOlursa',
  'promoted_content': None,
  'query': '%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa',
  'url': 'http://twitter.com/search?q=%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa'}]

EDIT To get just the hastags:

In [181]:

[x['name'] for x in temp[0]['trends'] if x['name'].find('#') ==0]

You can use startswith instead of find:

[x['name'] for x in temp[0]['trends'] if x['name'].startswith('#')]
like image 157
EdChum Avatar answered Sep 19 '22 16:09


Your data is a list containing one dictionary. One of the keys in this dictionary is called trends. The value for this key is a list of dictionaries. Each of these dictionaries contains a key called name, which holds a string containing a hashtag. Here's an example of accessing your data:

hashtags = []
trends = data[0]['trends']
for trend in trends:
    name = trend['name']
    if name.startswith('#'):

This can be compacted to:

hashtags = [trend['name'] for trend in data[0]['trends'] if trend['name'].startswith('#')]

First three lines of output:

>>> for hashtag in hashtags:
like image 42
Scorpion_God Avatar answered Sep 21 '22 16:09
