Web Scraping with Selenium Python [Twitter + Instagram]

Tags:

I am trying to web scrape both Instagram and Twitter based on geolocation. I can run a query search but I am having challenges in reloading the web page to to more and store the fields to data-frame.

I did find couple of examples for web scraping twitter and Instagram without API keys. But they are with respect to #tags keywords.

I am trying to scrape with respect to geo location and between old dates. so far I have come this far with writing code in python 3.X and all the latest versions of packages in anaconda.

'''
    Instagram - Components
    "id": "1478232643287060472", 
     "dimensions": {"height": 1080, "width": 1080}, 
     "owner": {"id": "351633262"}, 
     "thumbnail_src": "https://instagram.fdel1-1.fna.fbcdn.net/t51.2885-15/s640x640/sh0.08/e35/17439262_973184322815940_668652714938335232_n.jpg", 
     "is_video": false, 
     "code": "BSDvMHOgw_4", 
     "date": 1490439084, 
     "taken-at=213385402"
     "display_src": "https://instagram.fdel1-1.fna.fbcdn.net/t51.2885-15/e35/17439262_973184322815940_668652714938335232_n.jpg", 
     "caption": "Hakuna jambo zuri kama kumpa Mungu shukrani kwa kila jambo.. \ud83d\ude4f\ud83c\udffe\nIts weekend\n#lifeistooshorttobeunhappy\n#Godisgood \n#happysoul \ud83d\ude00", 
     "comments": {"count": 42}, 
     "likes": {"count": 3813}}, 
'''


import selenium
from selenium import webdriver
#from selenium import selenium
from bs4 import BeautifulSoup
import pandas

#geotags = pd.read_csv("geocodes.csv")
#parmalink = 
query = geocode%3A35.68501%2C139.7514%2C30km%20since:2016-03-01%20until:2016-03-02&f=tweets

twitterURL = 'https://twitter.com/search?q=' + query
#instaURL = "https://www.instagram.com/explore/locations/213385402/"


browser = webdriver.Firefox()
browser.get(twitterURL)
content = browser.page_source

soup = BeautifulSoup(content)
print (soup)

For Twitter Search Query I am getting syntax error

For Instagram I am not getting any error but I am not able to reload for more posts and write back to csv dataframe.

I am also trying to search with latitude and longitude search in both Twitter and Instagram.

I have a list of geo coordinates in csv I can use that input or can write a query for search.

Any way to complete the scraping with location will be appreciated.

Appreciate the help !!

376

asked Mar 26 '17 19:03

Sitz Blogz

1 Answers

I managed to make it work using requests. Your code would look something like this:

from bs4 import BeautifulSoup
import requests

query = "geocode%3A35.68501%2C139.7514%2C30km%20since:2016-03-01%20until:2016-03-02&f=tweets"

twitter = 'https://twitter.com/search?q=' + query

content = requests.get(twitter)
soup = BeautifulSoup(content.text)

print(soup)

Then you can use the soup object to parse what you need. The same thing should work for Instagram, if your query is correct.

137

answered Sep 27 '22 16:09

Fernando Cezar

Related questions
                            
                                python - increase efficiency of large-file search by readlines(size)
                            
                                Python - Black screen afther re-opening pygame application
                            
                                save a dependecy graph in python
                            
                                golang/python zlib difference
                            
                                Kill function after a given amount of time?
                            
                                psycopg2: How to know when cur.rowcount does not mean number of rows?
                            
                                Writing scikit-learn verbose log into an external file
                            
                                pyaudio-OSError: [Errno -9999] Unanticipated host error
                            
                                ESP8266 Micropython - connecting to University Wi-fi ( WPA2 Enterprise PEAP )
                            
                                Extract patches from 3D Matrix
                            
                                NaN values in pivot_table index causes loss of data
                            
                                Access private variables in injected method - python
                            
                                mypy differences in isinstance and issubclass from python 3.5 to 3.6 in parameterized generics
                            
                                __init__ takes one argument 2 given unittest
                            
                                Python mock os.environ used inside a class
                            
                                Grouped bar chart from two pandas data frames
                            
                                Nonblocking Scrapy pipeline to database
                            
                                Solve ODE in Python with a time-delay
                            
                                Typing, custom collection type
                            
                                Store most informative features from NLTK NaiveBayesClassifier in a list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Web Scraping with Selenium Python [Twitter + Instagram]

Tags:

python

pandas

twitter

web-scraping

instagram

Sitz Blogz

People also ask

1 Answers

Fernando Cezar

Recent Activity

Donate For Us