Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not able to Scrape geo coordinate with tweets [Lat-Lon]

I am trying to Download tweets using the Tweepy API But I am not able to get geo coordinates in my output.

I am looking for way to include latitude and longitude in the output data.

Any help is appreciated .. Thanks in advance. The code is developed in python 3.x and the output print screen is attached below the code.

I have seen that some of the Users don't share the location details but yet I am able scrape the data from that geo location so even if I be able to include lat-lon through program in the output it would be great.

Code

import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
import time

#reload(sys)
#sys.setdefaultencoding('utf8')

ckey = 'XXXXX'
csecret = 'XXXXXXX'
atoken = 'XXXXXX'
asecret = 'XXXXXX'

def toDataFrame(tweets):
    # COnvert to data frame
    DataSet = pd.DataFrame()

    DataSet['tweetID'] = [tweet.id for tweet in tweets]
    DataSet['tweetText'] = [tweet.text.encode('utf-8') for tweet in tweets]
    DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet in tweets]
    DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet in tweets]
    DataSet['tweetSource'] = [tweet.source for tweet in tweets]
    DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
    DataSet['userID'] = [tweet.user.id for tweet in tweets]
    DataSet['userScreen'] = [tweet.user.screen_name for tweet in tweets]
    DataSet['userName'] = [tweet.user.name for tweet in tweets]
    DataSet['userCreateDt'] = [tweet.user.created_at for tweet in tweets]
    DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
    DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet in tweets]
    DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet in tweets]
    DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
    DataSet['userTimezone'] = [tweet.user.time_zone for tweet in tweets]
    DataSet['Coordinates'] = [tweet.coordinates for tweet in tweets]
    DataSet['GeoEnabled'] = [tweet.user.geo_enabled for tweet in tweets]
    DataSet['Language'] = [tweet.user.lang for tweet in tweets]
    tweets_place= []
    #users_retweeted = []
    for tweet in tweets:
        if tweet.place:
            tweets_place.append(tweet.place.full_name)
        else:
            tweets_place.append('null')
    DataSet['TweetPlace'] = [i for i in tweets_place]
    #DataSet['UserWhoRetweeted'] = [i for i in users_retweeted]

    return DataSet

OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret,'access_token_key':atoken, 'access_token_secret':asecret}
#auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
auth = tweepy.AppAuthHandler('XXXXXXXX', 'XXXXX')

api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
if (not api):
    print ("Can't Authenticate")
    sys.exit(-1)
else:
    print ("Scraping data now") # Enter lat and long and radius in Kms  q='ganesh'
    cursor = tweepy.Cursor(api.search,geocode="23.50000,91.16000,50km",since='2017-09-01',until='2017-09-05',lang='en',count=10000)
    results=[]
    for item in cursor.items(1000): # Remove the limit to 1000
            results.append(item)


    DataSet = toDataFrame(results)
    DataSet.to_csv('Agartala_sep_1_4.csv',index=False)
    print ("Completed.. !!")

Output :

enter image description here

like image 206
Sitz Blogz Avatar asked Sep 04 '17 21:09

Sitz Blogz


3 Answers

Within the given code this additional block worked for me.

for i in range(0,len(df)):
        x="%s,%s,50km"%(df['latitude'][i],df['longitude'][i])
        cursor = tweepy.Cursor(api.search,geocode=x,since='2017-09-14',until='2017-09-15',lang='en',count=1000)
        results=[]
        print (i)
        for item in cursor.items(1000): # Remove the limit to 1000
            results.append(item)
        DataSet = toDataFrame(results)
        DataSet['latitude']=df['latitude'][i]
        DataSet['longitude']=df['longitude'][i]
        DataSet['radius']=100
        del DataSet['Coordinates']
like image 185
Sitz Blogz Avatar answered Sep 29 '22 19:09

Sitz Blogz


If your tweet.coordinates is not None, then it is the geoJSON object being returned by the listener. It seems possible the csv writer just writes a blank for the line if it doesn't know what to do with the object.

You could try to parse the object into latitude & longitude and save each one in a different column. Or cast the object some other way to represent it so your DataFrame can write it to csv.

something like this perhaps:

longitude, latitude = tweet.coordinates["coordinates"]["coordinates"]
like image 24
Benedicti Regula Avatar answered Sep 29 '22 20:09

Benedicti Regula


The coordinates field can be null, it depends on the permissions given by the user on twitter. You could query a service that takes in input a name place, and give you in output the coordinates of that place. Usually i use geocoder:

import geocoder

for tweet in tweets:
    if tweet.coordinates is None:
        result = geocoder.arcgis(tweet.place)
        tweet.place = (result.x, result.y)

if you don't like arcgis service - that has no api use limitation - you could query google, bing, geonames and more. Takes a look to the docs: http://geocoder.readthedocs.io/

like image 25
Lupanoide Avatar answered Sep 29 '22 21:09

Lupanoide