Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace links using lxml and iterlinks

Tags:

python

lxml

I'm new to lxml and I'm trying to figure how to rewrite links using iterlinks().

import lxml.html
html = lxml.html.document_fromstring(doc)
for element, attribute, link, pos in html.iterlinks():
    if attibute == "src":
         link = link.replace('foo', 'bar')
print lxml.html.tostring(html)

However, this doesn't actually replace the links. I know I can use .rewrite_links, but iterlinks provides more information about each link, so I would prefer to use this.

Thanks in advance.

like image 672
cyrus Avatar asked Apr 26 '11 10:04

cyrus


1 Answers

Instead of just assigning a new (string) value to the variable name link, you have to alter the element itself, in this case by setting its src attribute:

new_src = link.replace('foo', 'bar') # or element.get('src').replace('foo', 'bar')
element.set('src', new_src)

Note that - if you know which "links" you are interested in, for example, only img elements - you can also get the elements by using .findall() (or xpath or css selectors) instead of using .iterlinks().

like image 86
Steven Avatar answered Oct 06 '22 00:10

Steven