I'm new to lxml and I'm trying to figure how to rewrite links using iterlinks().
import lxml.html
html = lxml.html.document_fromstring(doc)
for element, attribute, link, pos in html.iterlinks():
if attibute == "src":
link = link.replace('foo', 'bar')
print lxml.html.tostring(html)
However, this doesn't actually replace the links. I know I can use .rewrite_links, but iterlinks provides more information about each link, so I would prefer to use this.
Thanks in advance.
Instead of just assigning a new (string) value to the variable name link
, you have to alter the element itself, in this case by setting its src
attribute:
new_src = link.replace('foo', 'bar') # or element.get('src').replace('foo', 'bar')
element.set('src', new_src)
Note that - if you know which "links" you are interested in, for example, only img
elements - you can also get the elements by using .findall()
(or xpath or css selectors) instead of using .iterlinks()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With