I'm trying to resize images from html code. This is one example:
My goal is to substitute " height="108" " and " width="150" with height and width 400.
I've tried the following lines, though they don't seem to work:
re.sub(r'width="[0-9]{2,4}"','width="400"',x)
re.sub(r'height="[0-9]{2,4}"','height="400"',x)
Does anyone have a solution for this? Ps: I'm not that good at Regex... :)
The reason it does not work is because strings are immutable, and you do not process the result. You can "solve" the issue with:
x = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
x = re.sub(r'height="[0-9]{2,4}"','height="400"',x)
That being said it is a very bad idea to process HTML/XML with regexes. Say you have a tag <foo altwidth="1234">. Now you will change it to <foo altwidth="400"> do you want that? Probably not.
You can for instance use BeautifulSoup:
soup = BeautifulSoup(x,'lxml')
for tag in soup.findAll(attrs={"width":True})
tag.width = 400
for tag in soup.findAll(attrs={"height":True})
tag.height = 400
x = str(soup)
Here we substitute all tags with a width attribute to width="400" and all tags with a height with height="400". You can make it more advanced by for instance only accepting <img> tags, like:
soup = BeautifulSoup(x,'lxml')
for tag in soup.findAll('img',attrs={"width":True})
tag.width = 400
for tag in soup.findAll('img',attrs={"height":True})
tag.height = 400
x = str(soup)
Seems to be working fine:
>>> x = '<foo width="150" height="108">'
>>> import re
>>> y = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
>>> y
'<foo width="400" height="108">'
Note that re.sub does not mutate x:
>>> x
'<foo width="150" height="108">'
>>> y
'<foo width="400" height="108">'
Perhaps you want to do this instead:
x = re.sub(r'width="[0-9]{2,4}"','width="400"',x)
x = re.sub(r'height="[0-9]{2,4}"','height="400"',x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With