I am trying to web scrape both Instagram and Twitter based on geolocation. I can run a query search but I am having challenges in reloading the web page to to more and store the fields to data-frame.
I did find couple of examples for web scraping twitter and Instagram without API keys. But they are with respect to #tags keywords.
I am trying to scrape with respect to geo location and between old dates. so far I have come this far with writing code in python 3.X and all the latest versions of packages in anaconda.
'''
Instagram - Components
"id": "1478232643287060472",
"dimensions": {"height": 1080, "width": 1080},
"owner": {"id": "351633262"},
"thumbnail_src": "https://instagram.fdel1-1.fna.fbcdn.net/t51.2885-15/s640x640/sh0.08/e35/17439262_973184322815940_668652714938335232_n.jpg",
"is_video": false,
"code": "BSDvMHOgw_4",
"date": 1490439084,
"taken-at=213385402"
"display_src": "https://instagram.fdel1-1.fna.fbcdn.net/t51.2885-15/e35/17439262_973184322815940_668652714938335232_n.jpg",
"caption": "Hakuna jambo zuri kama kumpa Mungu shukrani kwa kila jambo.. \ud83d\ude4f\ud83c\udffe\nIts weekend\n#lifeistooshorttobeunhappy\n#Godisgood \n#happysoul \ud83d\ude00",
"comments": {"count": 42},
"likes": {"count": 3813}},
'''
import selenium
from selenium import webdriver
#from selenium import selenium
from bs4 import BeautifulSoup
import pandas
#geotags = pd.read_csv("geocodes.csv")
#parmalink =
query = geocode%3A35.68501%2C139.7514%2C30km%20since:2016-03-01%20until:2016-03-02&f=tweets
twitterURL = 'https://twitter.com/search?q=' + query
#instaURL = "https://www.instagram.com/explore/locations/213385402/"
browser = webdriver.Firefox()
browser.get(twitterURL)
content = browser.page_source
soup = BeautifulSoup(content)
print (soup)
For Twitter Search Query I am getting syntax error
For Instagram I am not getting any error but I am not able to reload for more posts and write back to csv dataframe.
I am also trying to search with latitude and longitude search in both Twitter and Instagram.
I have a list of geo coordinates in csv I can use that input or can write a query for search.
Any way to complete the scraping with location will be appreciated.
Appreciate the help !!
The python package Instagramy is used to scrape Instagram quick and easily. This package is installed by running the following command. Based on the network connection it scrapes the data for you.
We can automate Instagram login page with Selenium webdriver in Java. To achieve this, first we have to launch the Instagram login page and identify the elements like email, password and login with the findElement method and interact with them.
Twitter's terms forbid non-permitted web scraping; “scraping the Services without the prior consent of Twitter is expressly prohibited,” but breaking these terms is a civil matter, so it isn't illegal. Twitter data is scraped all the time and problems are rarely reported, if ever.
I managed to make it work using requests
. Your code would look something like this:
from bs4 import BeautifulSoup
import requests
query = "geocode%3A35.68501%2C139.7514%2C30km%20since:2016-03-01%20until:2016-03-02&f=tweets"
twitter = 'https://twitter.com/search?q=' + query
content = requests.get(twitter)
soup = BeautifulSoup(content.text)
print(soup)
Then you can use the soup
object to parse what you need. The same thing should work for Instagram, if your query is correct.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With