<p>I'm learning python <code>requests</code> and BeautifulSoup. For an exercise, I've chosen to write a quick NYC parking ticket parser. I am able to get an html response which is quite ugly. I need to grab the <code>lineItemsTable</code> and parse all the tickets.</p> <p>You can reproduce the page by going here: <code>https://paydirect.link2gov.com/NYCParking-Plate/ItemSearch</code> and entering a <code>NY</code> plate <code>T630134C</code></p> <pre class="prettyprint"><code>soup = BeautifulSoup(plateRequest.text) #print(soup.prettify()) #print soup.find_all('tr') table = soup.find("table", { "class" : "lineItemsTable" }) for row in table.findAll("tr"): cells = row.findAll("td") print cells </code></pre> <p>Can someone please help me out? Simple looking for all <code>tr</code> does not get me anywhere.</p>

<h3>Updated Answer</h3> <p>If a programmer is interested in only parsing a table from a webpage, they can utilize the pandas method <code>pandas.read_html</code>.</p> <p>Let's say we want to extract the GDP data table from the website: https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries</p> <p>Then following codes does the job perfectly (No need of beautifulsoup and fancy html):</p> <pre class="prettyprint lang-py prettyprint-override"><code>import pandas as pd import requests url = "https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries" r = requests.get(url) df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list df = df_list[0] df.head() </code></pre> <h3>Output</h3> <p><img src="https://i.stack.imgur.com/hK8jt.png" alt="First five lines of the table from the Website"></p>

python BeautifulSoup parsing table

I'm learning python requests and BeautifulSoup. For an exercise, I've chosen to write a quick NYC parking ticket parser. I am able to get an html response which is quite ugly. I need to grab the lineItemsTable and parse all the tickets.

You can reproduce the page by going here: https://paydirect.link2gov.com/NYCParking-Plate/ItemSearch and entering a NY plate T630134C

soup = BeautifulSoup(plateRequest.text) #print(soup.prettify()) #print soup.find_all('tr')  table = soup.find("table", { "class" : "lineItemsTable" }) for row in table.findAll("tr"):     cells = row.findAll("td")     print cells

Can someone please help me out? Simple looking for all tr does not get me anywhere.

How do you parse a table in Python?

To parse the table, we'd like to grab a row, take the data from its columns, and then move on to the next row ad nauseam. In the next bit of code, we define a website that is simply the HTML for a table. We load it into BeautifulSoup and parse it, returning a pandas data frame of the contents.

Here you go:

data = [] table = soup.find('table', attrs={'class':'lineItemsTable'}) table_body = table.find('tbody')  rows = table_body.find_all('tr') for row in rows:     cols = row.find_all('td')     cols = [ele.text.strip() for ele in cols]     data.append([ele for ele in cols if ele]) # Get rid of empty values

This gives you:

[ [u'1359711259', u'SRF', u'08/05/2013', u'5310 4 AVE', u'K', u'19', u'125.00', u'$'],    [u'7086775850', u'PAS', u'12/14/2013', u'3908 6th Ave', u'K', u'40', u'125.00', u'$'],    [u'7355010165', u'OMT', u'12/14/2013', u'3908 6th Ave', u'K', u'40', u'145.00', u'$'],    [u'4002488755', u'OMT', u'02/12/2014', u'NB 1ST AVE @ E 23RD ST', u'5', u'115.00', u'$'],    [u'7913806837', u'OMT', u'03/03/2014', u'5015 4th Ave', u'K', u'46', u'115.00', u'$'],    [u'5080015366', u'OMT', u'03/10/2014', u'EB 65TH ST @ 16TH AV E', u'7', u'50.00', u'$'],    [u'7208770670', u'OMT', u'04/08/2014', u'333 15th St', u'K', u'70', u'65.00', u'$'],    [u'$0.00\n\n\nPayment Amount:'] ]

Couple of things to note:

The last row in the output above, the Payment Amount is not a part of the table but that is how the table is laid out. You can filter it out by checking if the length of the list is less than 7.
The last column of every row will have to be handled separately since it is an input text box.

Updated Answer

If a programmer is interested in only parsing a table from a webpage, they can utilize the pandas method pandas.read_html.

Let's say we want to extract the GDP data table from the website: https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries

Then following codes does the job perfectly (No need of beautifulsoup and fancy html):

import pandas as pd import requests  url = "https://worldpopulationreview.com/countries/countries-by-gdp/#worldCountries"  r = requests.get(url) df_list = pd.read_html(r.text) # this parses all the tables in webpages to a list df = df_list[0] df.head()

Output

First five lines of the table from the Website

python BeautifulSoup parsing table

Tags:

python

beautifulsoup

Cmag

People also ask

2 Answers

shaktimaan

Updated Answer

Output

BhishanPoudel

Recent Activity

Donate For Us

python BeautifulSoup parsing table

Tags:

python

beautifulsoup

Cmag

People also ask

2 Answers

shaktimaan

Updated Answer

Output

BhishanPoudel

Related questions

Recent Activity

Donate For Us