Is it possible to extract the embedded css properties from an html tag? For instance, suppose I want to find out what the vertical-align attribute for "s5" is.
I'm currently using beautifulsoup and have retrieved the span-tag with tag=soup.find(class_="s5")
. I've tried tag.attrs["class"]
but that just gives me s5
, with no way to link it to the embedded style. Is it possible to do this in python? Every question of this sort that I've found involves parsing inline css styles.
<html>
<head>
<style type="text/css">
* {margin:0; padding:0; text-indent:0; }
.s5 {color: #000; font-family:Verdana, sans-serif;
font-style: normal; font-weight: normal;
text-decoration: none; font-size: 17.5pt;
vertical-align: 10pt;}
</style>
</head>
<body>
<p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
This is a sample sentence. <span class="s5"> 1</span>
</p>
</body>
</html>
You can use a css parser like [cssutils][1]. I don't know if there is a function in the package itself to do something like this (can someone comment regarding this?), but i made a custom function to get it.
from bs4 import BeautifulSoup
import cssutils
html='''
<html>
<head>
<style type="text/css">
* {margin:0; padding:0; text-indent:0; }
.s5 {color: #000; font-family:Verdana, sans-serif;
font-style: normal; font-weight: normal;
text-decoration: none; font-size: 17.5pt;
vertical-align: 10pt;}
</style>
</head>
<body>
<p class="s1" style="padding-left: 7pt; text-indent: 0pt; text-align:left;">
This is a sample sentence. <span class="s5"> 1</span>
</p>
</body>
</html>
'''
def get_property(class_name,property_name):
for rule in sheet:
if rule.selectorText=='.'+class_name:
for property in rule.style:
if property.name==property_name:
return property.value
soup=BeautifulSoup(html,'html.parser')
sheet=cssutils.parseString(soup.find('style').text)
vl=get_property('s5','vertical-align')
print(vl)
Output
10pt
This is not perfect but maybe you can improve upon it. [1]: https://pypi.org/project/cssutils/
To improve upon the cssutils answer:
For an inline style="..."
tag:
import cssutils
# get the style from beautiful soup, like:
# style = tag['style']
style = "color: hotpink; background-color:#ff0000; visibility:hidden"
parsed_style = cssutils.parseStyle(style)
Now use parsed_style
like you would an dict
:
print(parsed_style['color']) # hotpink
print(parsed_style['background-color']) # f00
print(parsed_style['visibility']) # hidden
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With