Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all <a> tags

I scraped one container which includes urls for example:

<a href="url">text</a>

I need all to be removed and only the text remain...

import urllib2, sys
from bs4 import BeautifulSoup

site = "http://mysite.com"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)

Is it possible?

like image 423
a1204773 Avatar asked Apr 09 '26 13:04

a1204773


1 Answers

You can do this with Bleach

PyPi - Bleach

>>> import bleach

>>> bleach.clean('an <script>evil()</script> example')
u'an &lt;script&gt;evil()&lt;/script&gt; example'

>>> bleach.linkify('an http://example.com url')
u'an <a href="http://example.com" rel="nofollow">http://example.com</a> url

>>> bleach.delinkify('a <a href="http://ex.mp">link</a>')
u'a link'
like image 74
Jonathan Vanasco Avatar answered Apr 12 '26 18:04

Jonathan Vanasco



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!