I am trying to create a table scrape with BeautifulSoup. I wrote this Python code:
import urllib2 from bs4 import BeautifulSoup url = "http://dofollow.netsons.org/table1.htm" # change to whatever your url is page = urllib2.urlopen(url).read() soup = BeautifulSoup(page) for i in soup.find_all('form'): print i.attrs['class']
I need to scrape Nome, Cognome, Email.
Web scraping can be done in a variety of programming languages, but the most popular ones are Python, Java, and PHP. If you're just getting started with web scraping, we recommend using a tool like ParseHub or Scrapy. These tools make it easy to scrape data from websites without having to write any code.
Loop over table rows (tr
tag) and get the text of cells (td
tag) inside:
for tr in soup.find_all('tr')[2:]: tds = tr.find_all('td') print "Nome: %s, Cognome: %s, Email: %s" % \ (tds[0].text, tds[1].text, tds[2].text)
prints:
Nome: Massimo, Cognome: Allegri, Email: [email protected] Nome: Alessandra, Cognome: Anastasia, Email: [email protected] ...
FYI, [2:]
slice here is to skip two header rows.
UPD, here's how you can save results into txt file:
with open('output.txt', 'w') as f: for tr in soup.find_all('tr')[2:]: tds = tr.find_all('td') f.write("Nome: %s, Cognome: %s, Email: %s\n" % \ (tds[0].text, tds[1].text, tds[2].text))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With