Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup - modifying all links in a piece of HTML?

Tags:

I need to be able to modify every single link in an HTML document. I know that I need to use the SoupStrainer but I'm not 100% positive on how to implement it. If someone could direct me to a good resource or provide a code example, it'd be very much appreciated.

Thanks.

like image 894
Evan Fosmark Avatar asked Jan 20 '09 02:01

Evan Fosmark


People also ask

Can BeautifulSoup handle broken HTML?

It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection.

Can BeautifulSoup parse HTML?

The HTML content of the webpages can be parsed and scraped with Beautiful Soup.


1 Answers

Maybe something like this would work? (I don't have a Python interpreter in front of me, unfortunately)

from bs4 import BeautifulSoup soup = BeautifulSoup('<p>Blah blah blah <a href="http://google.com">Google</a></p>') for a in soup.findAll('a'):   a['href'] = a['href'].replace("google", "mysite")  result = str(soup) 
like image 167
Lusid Avatar answered Oct 06 '22 09:10

Lusid