Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding multiple attributes within the span tag in Python

There are two values that i am looking to scrape from a website. These are present in the following tags:

<span class="sp starBig">4.1</span>
<span class="sp starGryB">2.9</span>

I need the values sp starBig, sp starGryB.

The findAll expression that i am using is -

soup.findAll('span', {'class': ['sp starGryB', 'sp starBig']}):

The code gets executed without any errors yet no results get displayed.

like image 722
RDPD Avatar asked Apr 26 '15 12:04

RDPD


1 Answers

As per the docs, assuming Beautiful Soup 4, matching for multiple CSS classes with strings like 'sp starGryB' is brittle and should not be done:

soup.find_all('span', {'class': 'sp starGryB'})
# [<span class="sp starGryB">2.9</span>]
soup.find_all('span', {'class': 'starGryB sp'})
# []

CSS selectors should be used instead, like so:

soup.select('span.sp.starGryB')
# [<span class="sp starGryB">2.9</span>]
soup.select('span.starGryB.sp')
# [<span class="sp starGryB">2.9</span>]

In your case:

items = soup.select('span.sp.starGryB') + soup.select('span.sp.starBig')

or something more sophisticated like:

items = [i for s in ['span.sp.starGryB', 'span.sp.starBig'] for i in soup.select(s)]
like image 168
famousgarkin Avatar answered Nov 09 '22 23:11

famousgarkin