Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I select every nth child in BeautifulSoup?

In my table below I have scraped Items 1-4 and stored them in a variable called headings.

I would also like to select Values 1-4 and store them in a variable called columns, is there anyway to select every second . Something like

columns = boxinfo.find_all("td").nthChild(2)

Table structure I am scraping from

<div class="box1">

<table class="table1">

<tr><td class="label">Item1</td><td>Value1</td></tr>

<tr><td class="label">Item2</td><td>Value2</td></tr>

<tr><td class="label">Item3</td><td>Value3</td></tr>

<tr><td class="label">Item4</td><td>Value4</td></tr>

</table>

</div>

Code

#Find our information
boxinfo = soup.find("div", {"id": "box1"})
headings = boxinfo.find_all("td", {"class": "label"})
columns = boxinfo.find_all("td").nthChild(2) #This does not work :(
like image 983
Ninja2k Avatar asked Jan 03 '23 04:01

Ninja2k


1 Answers

If you are trying to extract all of the values, then you would let BeautifulSoup return all of the items and Python can then filter the values you want. For example:

from bs4 import BeautifulSoup

html = """<div class="box1">
<table class="table1">
<tr><td class="label">Item1</td><td>Value1</td></tr>
<tr><td class="label">Item2</td><td>Value2</td></tr>
<tr><td class="label">Item3</td><td>Value3</td></tr>
<tr><td class="label">Item4</td><td>Value4</td></tr>
</table>
</div>"""

soup = BeautifulSoup(html, "html.parser")
div = soup.find("div", class_="box1")
values = []

for tr in div.find_all('tr'):
    values.append(tr.find_all("td")[1].text)

print(values)

Giving you a list of values:

['Value1', 'Value2', 'Value3', 'Value4']

Or if you want a list of containing all of the data as columns:

soup = BeautifulSoup(html, "html.parser")
div = soup.find("div", class_="box1")
columns = []

for tr in div.find_all('tr'):
    columns.append([td.text for td in tr.find_all("td")])

columns = list(zip(*columns))    

print(columns)
print(columns[1])  # display the 2nd column

Giving you:

[('Item1', 'Item2', 'Item3', 'Item4'), ('Value1', 'Value2', 'Value3', 'Value4')]
('Value1', 'Value2', 'Value3', 'Value4')

list(zip(*columns)) is a way of transposing a list of rows into a list of columns.

like image 190
Martin Evans Avatar answered Jan 05 '23 04:01

Martin Evans