Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python re.sub replace html attributes

Tags:

python

html

regex

I'm trying to resize images from html code. This is one example:

My goal is to substitute " height="108" " and " width="150" with height and width 400. I've tried the following lines, though they don't seem to work:

re.sub(r'width="[0-9]{2,4}"','width="400"',x)
re.sub(r'height="[0-9]{2,4}"','height="400"',x)

Does anyone have a solution for this? Ps: I'm not that good at Regex... :)

like image 403
Login Avatar asked Apr 12 '26 13:04

Login


2 Answers

The reason it does not work is because strings are immutable, and you do not process the result. You can "solve" the issue with:

x = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
x = re.sub(r'height="[0-9]{2,4}"','height="400"',x)

That being said it is a very bad idea to process HTML/XML with regexes. Say you have a tag <foo altwidth="1234">. Now you will change it to <foo altwidth="400"> do you want that? Probably not.

You can for instance use BeautifulSoup:

soup = BeautifulSoup(x,'lxml')

for tag in soup.findAll(attrs={"width":True})
    tag.width = 400
for tag in soup.findAll(attrs={"height":True})
    tag.height = 400
x = str(soup)

Here we substitute all tags with a width attribute to width="400" and all tags with a height with height="400". You can make it more advanced by for instance only accepting <img> tags, like:

soup = BeautifulSoup(x,'lxml')

for tag in soup.findAll('img',attrs={"width":True})
    tag.width = 400
for tag in soup.findAll('img',attrs={"height":True})
    tag.height = 400
x = str(soup)
like image 129
Willem Van Onsem Avatar answered Apr 15 '26 04:04

Willem Van Onsem


Seems to be working fine:

>>> x = '<foo width="150" height="108">'
>>> import re
>>> y = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
>>> y
'<foo width="400" height="108">'

Note that re.sub does not mutate x:

>>> x
'<foo width="150" height="108">'
>>> y
'<foo width="400" height="108">'

Perhaps you want to do this instead:

x = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
x = re.sub(r'height="[0-9]{2,4}"','height="400"',x)
like image 32
Paulo Scardine Avatar answered Apr 15 '26 02:04

Paulo Scardine



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!