Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python beautifulsoup iterate over table

Tags:

I am trying to scrape table data into a CSV file. Unfortunately, I've hit a road block and the following code simply repeats the TD from the first TR for all subsequent TRs.

import urllib.request from bs4 import BeautifulSoup  f = open('out.txt','w')  url = "http://www.international.gc.ca/about-a_propos/atip-aiprp/reports-rapports/2012/02-atip_aiprp.aspx" page = urllib.request.urlopen(url)  soup = BeautifulSoup(page)  soup.unicode  table1 = soup.find("table", border=1) table2 = soup.find('tbody') table3 = soup.find_all('tr')  for td in table3:     rn = soup.find_all("td")[0].get_text()     sr = soup.find_all("td")[1].get_text()     d = soup.find_all("td")[2].get_text()     n = soup.find_all("td")[3].get_text()      print(rn + "," + sr + "," + d + ",", file=f) 

This is my first ever Python script so any help would be appreciated! I have looked over other question answers but cannot figure out what I am doing wrong here.

like image 787
Will Avatar asked Apr 25 '12 04:04

Will


Video Answer


1 Answers

You're starting at the top level of your document each time you use find() or find_all(), so when you ask for, for example, all the "td"` tags you're getting all the "td" tags in the document, not just those in the table and row you have searched for. You might as well not search for those because they're not being used the way your code is written.

I think you want to do something like this:

table1 = soup.find("table", border=1) table2 = table1.find('tbody') table3 = table2.find_all('tr') 

Or, you know, something more like this, with more descriptive variable names to boot:

rows = soup.find("table", border=1).find("tbody").find_all("tr")  for row in rows:     cells = row.find_all("td")     rn = cells[0].get_text()     # and so on 
like image 175
kindall Avatar answered Sep 30 '22 07:09

kindall