I have a script to replace a word in a "ahref" tag. However i want to remove the a href entirely, so that you have the word Google without a link.
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
a['href'] = a['href'].replace("google", "mysite")
result = str(soup)
Also can you find all the words placed in a href and place a " " before and after them. I'm not sure how to. I guess this is done before the replacing.
Use del a['href']
instead, just like you would on a plain dictionary:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
del a['href']
gives you:
>>> print str(soup)
<p>Hello <a>Google</a></p>
UPDATE:
If you want to get rid of the <a>
tags altogether, you can use the .replaceWithChildren()
method:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
a.replaceWithChildren()
gives you:
>>> print str(soup)
<p>Hello Google</p>
...and, what you requested in the comment (wrap the text content of the tag with spaces), can be achieved with:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
del a['href']
a.setString(' %s ' % a.text)
gives you:
>>> print str(soup)
<p>Hello <a> Google </a></p>
You can use bleach
pip install bleach
then use it like this...
import bleach
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<a href = "somesite.com">hello world</a>')
clean = bleach.clean(soup,tags[],strip=True)
This results in...
>>> print clean
u'hello world'
here are the docs for bleach.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With