Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup: Get the contents of a specific table

My local airport disgracefully blocks users without IE, and looks awful. I want to write a Python scripts that would get the contents of the Arrival and Departures pages every few minutes, and show them in a more readable manner.

My tools of choice are mechanize for cheating the site to believe I use IE, and BeautifulSoup for parsing page to get the flights data table.

Quite honestly, I got lost in the BeautifulSoup documentation, and can't understand how to get the table (whose title I know) from the entire document, and how to get a list of rows from that table.

Any ideas?

like image 664
Adam Matan Avatar asked May 29 '10 15:05

Adam Matan


People also ask

How do you get content from BeautifulSoup?

Sending an HTTP GET request to the URL of the webpage that you want to scrape, which will respond with HTML content. We can do this by using the Request library of Python. Fetching and parsing the data using Beautifulsoup and maintain the data in some data structure such as Dict or List.


2 Answers

This is not the specific code you need, just a demo of how to work with BeautifulSoup. It finds the table who's id is "Table1" and gets all of its tr elements.

html = urllib2.urlopen(url).read() bs = BeautifulSoup(html) table = bs.find(lambda tag: tag.name=='table' and tag.has_attr('id') and tag['id']=="Table1")  rows = table.findAll(lambda tag: tag.name=='tr') 
like image 119
Ofri Raviv Avatar answered Oct 20 '22 02:10

Ofri Raviv


soup = BeautifulSoup(HTML)  # the first argument to find tells it what tag to search for # the second you can pass a dict of attr->value pairs to filter # results that match the first tag table = soup.find( "table", {"title":"TheTitle"} )  rows=list() for row in table.findAll("tr"):    rows.append(row)  # now rows contains each tr in the table (as a BeautifulSoup object) # and you can search them to pull out the times 
like image 28
goggin13 Avatar answered Oct 20 '22 01:10

goggin13