Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove all a href tags from text

I have a script to replace a word in a "ahref" tag. However i want to remove the a href entirely, so that you have the word Google without a link.

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
    a['href'] = a['href'].replace("google", "mysite")
result = str(soup)

Also can you find all the words placed in a href and place a " " before and after them. I'm not sure how to. I guess this is done before the replacing.

like image 278
user2784753 Avatar asked Sep 29 '13 17:09

user2784753


2 Answers

Use del a['href'] instead, just like you would on a plain dictionary:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
    del a['href']

gives you:

>>> print str(soup)
<p>Hello <a>Google</a></p>

UPDATE:

If you want to get rid of the <a> tags altogether, you can use the .replaceWithChildren() method:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
    a.replaceWithChildren()

gives you:

>>> print str(soup)
<p>Hello Google</p>

...and, what you requested in the comment (wrap the text content of the tag with spaces), can be achieved with:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<p>Hello <a href="http://google.com">Google</a></p>')
for a in soup.findAll('a'):
    del a['href']
    a.setString(' %s ' % a.text)

gives you:

>>> print str(soup)
<p>Hello <a> Google </a></p>
like image 141
Erik Kaplun Avatar answered Oct 09 '22 16:10

Erik Kaplun


You can use bleach

pip install bleach

then use it like this...

import bleach
from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup('<a href = "somesite.com">hello world</a>')
clean = bleach.clean(soup,tags[],strip=True)

This results in...

>>> print clean
u'hello world'

here are the docs for bleach.

like image 22
Pdksock Avatar answered Oct 09 '22 15:10

Pdksock