Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the selected option using BeautifulSoup

I would like to only get the selected options of a select. For example:

<select>
  <option value="0">2002/12</option>
  <option value="1">2003/12</option>
  <option value="2">2004/12</option>
  <option value="3">2005/12</option>
  <option value="4">2006/12</option>
  <option value="5" selected>2007/12</option>
</select>

I know I can do

theSelectTag.findAll('option',attrs={'selected':''})

but that is returning all the options. Is there a way to get all the elements where an attribute exists? Please note, I ask for all, as the site I'm scraping does include the selected attribute for multiple options.

I'm using Python 2.7 and Beautiful Soup 4.1.2

like image 853
Eric G Avatar asked Feb 13 '13 21:02

Eric G


People also ask

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

How do I select multiple tags in BeautifulSoup?

To find multiple tags, you can use the , CSS selector, where you can specify multiple tags separated by a comma , . To use a CSS selector, use the . select_one() method instead of . find() , or .


1 Answers

Passing True as the attribute's value will match all elements with that attribute:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''<select>
...   <option value="0">2002/12</option>
...   <option value="1">2003/12</option>
...   <option value="2">2004/12</option>
...   <option value="3">2005/12</option>
...   <option value="4">2006/12</option>
...   <option value="5" selected>2007/12</option>
... </select>''')
>>> soup.find_all('option', selected=True)
    [<option selected="" value="5">2007/12</option>]
>>> soup.find_all('option', {'selected': True})
    [<option selected="" value="5">2007/12</option>]

And with lxml:

>>> from lxml import etree
>>> root = etree.HTML('''<select>
  <option value="0">2002/12</option>
  <option value="1">2003/12</option>
  <option value="2">2004/12</option>
  <option value="3">2005/12</option>
  <option value="4">2006/12</option>
  <option value="5" selected>2007/12</option>
</select>''')
>>> root.xpath('//option[@selected]')
    [<Element option at 0x228b7d0>]
like image 187
Blender Avatar answered Sep 28 '22 04:09

Blender