Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all <li>'s within a specific <ul> class?

Environment:

Beautiful Soup 4

Python 2.7.5

Logic:

'find_all' <li> instances that are within a <ul> with a class of my_class eg:

<ul class='my_class'>
<li>thing one</li>
<li>thing two</li>
</ul>

Clarification: Just get the 'text' between the <li> tags.

Python Code:

(The find_all below is not correct, I am just putting it in context)

from bs4 import BeautifulSoup, Comment
import re

# open original file
fo = open('file.php', 'r')
# convert to string
fo_string = fo.read()
# close original file
fo.close()
# create beautiful soup object from fo_string
bs_fo_string = BeautifulSoup(fo_string, "lxml")
# get rid of html comments
my_comments = bs_fo_string.findAll(text=lambda text:isinstance(text, Comment))
[my_comment.extract() for my_comment in my_comments]

my_li_list = bs_fo_string.find_all('ul', 'my_class')

print my_li_list
like image 936
user1063287 Avatar asked Jun 22 '13 03:06

user1063287


2 Answers

This?

>>> html = """<ul class='my_class'>
... <li>thing one</li>
... <li>thing two</li>
... </ul>"""
>>> from bs4 import BeautifulSoup as BS
>>> soup = BS(html)
>>> for ultag in soup.find_all('ul', {'class': 'my_class'}):
...     for litag in ultag.find_all('li'):
...             print litag.text
... 
thing one
thing two

Explanation:

soup.find_all('ul', {'class': 'my_class'}) finds all the ul tags with a class of my_class.

We then find all the li tags in those ul tags, and print the content of the tag.

like image 53
TerryA Avatar answered Sep 29 '22 07:09

TerryA


This does the trick with BeautifulSoup3, don't have 4 on this machine.

>>> [li.string for li in bs_fo_string.find('ul', {'class': 'my_class'}).findAll('li')]
[u'thing one', u'thing two']

The idea is to search first for the ul with 'my_class' class, and then findAll of the li's within that ul.

If you had additional ul's with the same class you might want to use a findAll on the ul search as well, and change the list comprehension to be nested.

like image 32
sberry Avatar answered Sep 29 '22 07:09

sberry