I'm having a following problem:
using the Twitter API and tweepy module, I want to monitor the trending topics and extract hashtags out of the data.
This code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import tweepy, json
CONSUMER_KEY = 'key'
CONSUMER_SECRET = 'secret'
ACCESS_KEY = 'key'
ACCESS_SECRET = 'secret'
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_KEY, ACCESS_SECRET)
api = tweepy.API(auth)
trends1 = api.trends_place(1)
print trends1
gives me data about globally trending topics that is structured like this:
[{u'created_at': u'2014-04-16T12:13:15Z', u'trends': [{u'url': u'http://twitter.com/search?q=%22South+Korea%22', u'query': u'%22South+Korea%22', u'name': u'South Korea', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23FETUSONEDIRECTIONDAY', u'query': u'%23FETUSONEDIRECTIONDAY', u'name': u'#FETUSONEDIRECTIONDAY', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23PrayForSouthKorea', u'query': u'%23PrayForSouthKorea', u'name': u'#PrayForSouthKorea', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23GaraGaraRP', u'query': u'%23GaraGaraRP', u'name': u'#GaraGaraRP', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A', u'query': u'%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A', u'name': u'#\u0625\u0633\u0645_\u0623\u0645\u064a_\u0628\u062c\u0648\u0627\u0644\u064a', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa', u'query': u'%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa', u'name': u'#Kad\u0131nlarKamyon\u015eof\xf6r\xfcOlursa', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%22Dear+My+BestFriend%22', u'query': u'%22Dear+My+BestFriend%22', u'name': u'Dear My BestFriend', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%22%D0%A1%D0%B0%D0%BC%D0%BE%D0%BE%D0%B1%D0%BE%D1%80%D0%BE%D0%BD%D0%B0+100%22', u'query': u'%22%D0%A1%D0%B0%D0%BC%D0%BE%D0%BE%D0%B1%D0%BE%D1%80%D0%BE%D0%BD%D0%B0+100%22', u'name': u'\u0421\u0430\u043c\u043e\u043e\u0431\u043e\u0440\u043e\u043d\u0430 100', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=%22If+I+Stay%22', u'query': u'%22If+I+Stay%22', u'name': u'If I Stay', u'promoted_content': None}, {u'url': u'http://twitter.com/search?q=Gabashvili', u'query': u'Gabashvili', u'name': u'Gabashvili', u'promoted_content': None}], u'as_of': u'2014-04-16T12:20:29Z', u'locations': [{u'woeid': 1, u'name': u'Worldwide'}]}]
Is this a python list, containing several dictionaries? How can I extract hashtags out of that data and save them into new variables?
I'm new to python so please explain your choices.
Thanks!
Steps to obtain keys: – For access token, click ” Create my access token”. The page will refresh and generate access token. Tweepy is one of the library that should be installed using pip. Now in order to authorize our app to access Twitter on our behalf, we need to use the OAuth Interface.
On Twitter's mobile apps, you can find Trends listed under the Trends section of the Explore tab when signed in to twitter.com on a desktop or laptop computer, Trends are listed in many places, including the Home timeline, Notifications, search results, and profile pages.
In your example you have a single entry in your list, consisting of nested dicts with key value 'trends' each value is a another dict, the one you are interested in is 'name' and in particular if it starts with '#':
In [180]:
[x for x in temp[0]['trends'] if x['name'].find('#') ==0]
Out[180]:
[{'name': '#FETUSONEDIRECTIONDAY',
'promoted_content': None,
'query': '%23FETUSONEDIRECTIONDAY',
'url': 'http://twitter.com/search?q=%23FETUSONEDIRECTIONDAY'},
{'name': '#PrayForSouthKorea',
'promoted_content': None,
'query': '%23PrayForSouthKorea',
'url': 'http://twitter.com/search?q=%23PrayForSouthKorea'},
{'name': '#GaraGaraRP',
'promoted_content': None,
'query': '%23GaraGaraRP',
'url': 'http://twitter.com/search?q=%23GaraGaraRP'},
{'name': '#إسم_أمي_بجوالي',
'promoted_content': None,
'query': '%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A',
'url': 'http://twitter.com/search?q=%23%D8%A5%D8%B3%D9%85_%D8%A3%D9%85%D9%8A_%D8%A8%D8%AC%D9%88%D8%A7%D9%84%D9%8A'},
{'name': '#KadınlarKamyonŞoförüOlursa',
'promoted_content': None,
'query': '%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa',
'url': 'http://twitter.com/search?q=%23Kad%C4%B1nlarKamyon%C5%9Eof%C3%B6r%C3%BCOlursa'}]
EDIT To get just the hastags:
In [181]:
[x['name'] for x in temp[0]['trends'] if x['name'].find('#') ==0]
Out[181]:
['#FETUSONEDIRECTIONDAY',
'#PrayForSouthKorea',
'#GaraGaraRP',
'#إسم_أمي_بجوالي',
'#KadınlarKamyonŞoförüOlursa']
You can use startswith
instead of find
:
[x['name'] for x in temp[0]['trends'] if x['name'].startswith('#')]
Your data is a list containing one dictionary. One of the keys in this dictionary is called trends. The value for this key is a list of dictionaries. Each of these dictionaries contains a key called name, which holds a string containing a hashtag. Here's an example of accessing your data:
hashtags = []
trends = data[0]['trends']
for trend in trends:
name = trend['name']
if name.startswith('#'):
hashtags.append(name)
This can be compacted to:
hashtags = [trend['name'] for trend in data[0]['trends'] if trend['name'].startswith('#')]
First three lines of output:
>>> for hashtag in hashtags:
print(hashtag)
#FETUSONEDIRECTIONDAY
#PrayForSouthKorea
#GaraGaraRP
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With