Remove all inline styles using BeautifulSoup

Question

I'm doing some HTML cleaning with BeautifulSoup. Noob to both Python & BeautifulSoup. I've got tags being removed correctly as follows, based on an answer I found elsewhere on Stackoverflow:

[s.extract() for s in soup('script')]

But how to remove inline styles? For instance the following:

<p class="author" id="author_id" name="author_name" style="color:red;">Text</p>
<img class="some_image" href="somewhere.com">

Should become:

<p>Text</p>
<img href="somewhere.com">

How to delete the inline class, id, name & style attributes of all elements?

Answers to other similar questions I could find all mentioned using a CSS parser to handle this, rather than BeautifulSoup, but as the task is simply to remove rather than manipulate the attributes, and is a blanket rule for all tags, I was hoping to find a way to do it all within BeautifulSoup.

jmk · Accepted Answer

You don't need to parse any CSS if you just want to remove it all. BeautifulSoup provides a way to remove entire attributes like so:

for tag in soup():
    for attribute in ["class", "id", "name", "style"]:
        del tag[attribute]

Also, if you just want to delete entire tags (and their contents), you don't need extract(), which returns the tag. You just need decompose():

[tag.decompose() for tag in soup("script")]

Not a big difference, but just something else I found while looking at the docs. You can find more details about the API in the BeautifulSoup documentation, with many examples.

Jonathan Vanasco · Answer

I wouldn't do this in BeautifulSoup - you'll spend a lot of time trying, testing, and working around edge cases.

Bleach does exactly this for you. http://pypi.python.org/pypi/bleach

If you were to do this in BeautifulSoup, I'd suggest you go with the "whitelist" approach, like Bleach does. Decide which tags may have which attributes, and strip every tag/attribute that doesn't match.

Remove all inline styles using BeautifulSoup

Tags:

python

css

beautifulsoup

inline

Ila

2 Answers

jmk

Jonathan Vanasco

Recent Activity

Donate For Us

Remove all inline styles using BeautifulSoup

Tags:

python

css

beautifulsoup

inline

Ila

2 Answers

jmk

Jonathan Vanasco

Related questions

Recent Activity

Donate For Us