Input HTML:
<div style="display: flex">
<div class="half" style="font-size: 0.8em;width: 33%;"> apple </div>
<div class="half" style="font-size: 0.8em;text-align: center;width: 28%;"> peach </div>
<div class="half" style="font-size: 0.8em;text-align: right;width: 33%;" title="nofruit"> cucumber </div>
</div>
The desired output: all div
elements exactly under <div style="display: flex">
.
I'm trying to locate the parent div
with a CSS selector:
div[style="display: flex"]
This throws an error:
>>> soup.select('div[style="display: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
It looks like BeautifulSoup
tries to interpret the colon as a pseudo-class syntax.
I've tried to follow the advices suggested at Handling a colon in an element ID in a CSS selector, but it still throws errors:
>>> soup.select('div[style="display\: flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1400, in select
'Only the following pseudo-classes are implemented: nth-of-type.')
NotImplementedError: Only the following pseudo-classes are implemented: nth-of-type.
>>> soup.select('div[style="display\3A flex"]')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/bs4/element.py", line 1426, in select
'Unsupported or invalid CSS selector: "%s"' % token)
ValueError: Unsupported or invalid CSS selector: "div[style="displayA"
The Question:
What is the correct way to use/escape a colon in attribute values in BeautifulSoup
CSS selectors?
Note that I can workaround it with a partial attribute match:
soup.select("div[style$=flex]")
Or, with a find_all()
:
soup.find_all("div", style="display: flex")
Also note that I understand that using style
to locate elements is far from being a good location technique, but the question itself is meant to be generic and the provided HTML is just an example.
Update: the issue is now fixed in BeautifulSoup 4.5.0, upgrade if needed:
pip install --upgrade beautifulsoup4
Old answer:
Created an issue at the BeautifulSoup
issue tracker:
Will update the answer in case of any updates in the launchpad issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With