Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching partial ids in BeautifulSoup

I'm using BeautifulSoup. I have to find any reference to the <div> tags with id like: post-#.

For example:

<div id="post-45">...</div>
<div id="post-334">...</div>

I have tried:

html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)
print soupHandler.findAll('div', id='post-*')

How can I filter this?

like image 250
Max Frai Avatar asked May 13 '10 21:05

Max Frai


3 Answers

You can pass a function to findAll:

>>> print soupHandler.findAll('div', id=lambda x: x and x.startswith('post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]

Or a regular expression:

>>> print soupHandler.findAll('div', id=re.compile('^post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]
like image 122
Mark Byers Avatar answered Nov 03 '22 03:11

Mark Byers


Since he is asking to match "post-#somenumber#", it's better to precise with

import re
[...]
soupHandler.findAll('div', id=re.compile("^post-\d+"))
like image 41
xiamx Avatar answered Nov 03 '22 03:11

xiamx


soupHandler.findAll('div', id=re.compile("^post-$"))

looks right to me.

like image 1
Auston Avatar answered Nov 03 '22 03:11

Auston