Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I parse a website using Selenium and Beautifulsoup in python? [closed]

New to programming and figured out how to navigate to where I need to go using Selenium. I'd like to parse the data now but not sure where to start. Can someone hold my hand a sec and point me in the right direction?

Any help appreciated -

like image 319
twitch after coffee Avatar asked Dec 19 '12 20:12

twitch after coffee


People also ask

Can you use Selenium and BeautifulSoup together?

The combination of Beautiful Soup and Selenium will do the job of dynamic scraping. Selenium automates web browser interaction from python. Hence the data rendered by JavaScript links can be made available by automating the button clicks with Selenium and then can be extracted by Beautiful Soup.


2 Answers

Assuming you are on the page you want to parse, Selenium stores the source HTML in the driver's page_source attribute. You would then load the page_source into BeautifulSoup as follows:

In [8]: from bs4 import BeautifulSoup  In [9]: from selenium import webdriver  In [10]: driver = webdriver.Firefox()  In [11]: driver.get('http://news.ycombinator.com')  In [12]: html = driver.page_source  In [13]: soup = BeautifulSoup(html)  In [14]: for tag in soup.find_all('title'):    ....:     print tag.text    ....:         ....:      Hacker News 
like image 191
RocketDonkey Avatar answered Oct 14 '22 08:10

RocketDonkey


As your question isn't particularly concrete, here's a simple example. To do something more useful read the BS docs. You will also find plenty of examples of selenium (and BS )usage here in SO.

from selenium import webdriver from bs4 import BeautifulSoup  browser=webdriver.Firefox() browser.get('http://webpage.com')  soup=BeautifulSoup(browser.page_source)  #do something useful #prints all the links with corresponding text  for link in soup.find_all('a'):     print link.get('href',None),link.get_text() 
like image 41
root Avatar answered Oct 14 '22 09:10

root