New to programming and figured out how to navigate to where I need to go using Selenium. I'd like to parse the data now but not sure where to start. Can someone hold my hand a sec and point me in the right direction?
Any help appreciated -
The combination of Beautiful Soup and Selenium will do the job of dynamic scraping. Selenium automates web browser interaction from python. Hence the data rendered by JavaScript links can be made available by automating the button clicks with Selenium and then can be extracted by Beautiful Soup.
Assuming you are on the page you want to parse, Selenium stores the source HTML in the driver's page_source
attribute. You would then load the page_source
into BeautifulSoup
as follows:
In [8]: from bs4 import BeautifulSoup In [9]: from selenium import webdriver In [10]: driver = webdriver.Firefox() In [11]: driver.get('http://news.ycombinator.com') In [12]: html = driver.page_source In [13]: soup = BeautifulSoup(html) In [14]: for tag in soup.find_all('title'): ....: print tag.text ....: ....: Hacker News
As your question isn't particularly concrete, here's a simple example. To do something more useful read the BS docs. You will also find plenty of examples of selenium (and BS )usage here in SO.
from selenium import webdriver from bs4 import BeautifulSoup browser=webdriver.Firefox() browser.get('http://webpage.com') soup=BeautifulSoup(browser.page_source) #do something useful #prints all the links with corresponding text for link in soup.find_all('a'): print link.get('href',None),link.get_text()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With