I'm using BeautifulSoup. I have to find any reference to the <div>
tags with id like: post-#
.
For example:
<div id="post-45">...</div>
<div id="post-334">...</div>
I have tried:
html = '<div id="post-45">...</div> <div id="post-334">...</div>'
soupHandler = BeautifulSoup(html)
print soupHandler.findAll('div', id='post-*')
How can I filter this?
You can pass a function to findAll:
>>> print soupHandler.findAll('div', id=lambda x: x and x.startswith('post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]
Or a regular expression:
>>> print soupHandler.findAll('div', id=re.compile('^post-'))
[<div id="post-45">...</div>, <div id="post-334">...</div>]
Since he is asking to match "post-#somenumber#", it's better to precise with
import re
[...]
soupHandler.findAll('div', id=re.compile("^post-\d+"))
soupHandler.findAll('div', id=re.compile("^post-$"))
looks right to me.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With