<pre class="prettyprint"><code>import urllib2 website = "WEBSITE" openwebsite = urllib2.urlopen(website) html = getwebsite.read() print html </code></pre> So far so good. But I want only href links from the plain text HTML. How can I solve this problem?

Try with Beautifulsoup: <pre class="prettyprint"><code>from BeautifulSoup import BeautifulSoup import urllib2 import re html_page = urllib2.urlopen("http://www.yourwebsite.com") soup = BeautifulSoup(html_page) for link in soup.findAll('a'): print link.get('href') </code></pre> In case you just want links starting with <code>http://</code>, you should use: <pre class="prettyprint"><code>soup.findAll('a', attrs={'href': re.compile("^http://")}) </code></pre> In Python 3 with BS4 it should be: <pre class="prettyprint"><code>from bs4 import BeautifulSoup import urllib.request html_page = urllib.request.urlopen("http://www.yourwebsite.com") soup = BeautifulSoup(html_page, "html.parser") for link in soup.findAll('a'): print(link.get('href')) </code></pre>

How can I get href links from HTML using Python?

import urllib2  website = "WEBSITE" openwebsite = urllib2.urlopen(website) html = getwebsite.read()  print html

So far so good.

But I want only href links from the plain text HTML. How can I solve this problem?

724

asked Jun 19 '10 12:06

user371012

1 Answers

Try with Beautifulsoup:

from BeautifulSoup import BeautifulSoup import urllib2 import re  html_page = urllib2.urlopen("http://www.yourwebsite.com") soup = BeautifulSoup(html_page) for link in soup.findAll('a'):     print link.get('href')

In case you just want links starting with http://, you should use:

soup.findAll('a', attrs={'href': re.compile("^http://")})

In Python 3 with BS4 it should be:

from bs4 import BeautifulSoup import urllib.request  html_page = urllib.request.urlopen("http://www.yourwebsite.com") soup = BeautifulSoup(html_page, "html.parser") for link in soup.findAll('a'):     print(link.get('href'))

174

answered Sep 18 '22 15:09

systempuntoout

Related questions
                            
                                What does "TypeError: 'float' object cannot be interpreted as an integer" mean when using range?
                            
                                The 'pip==7.1.0' distribution was not found and is required by the application
                            
                                Is there a simple way to change a column of yes/no to 1/0 in a Pandas dataframe?
                            
                                django set DateTimeField to server's current time
                            
                                Check if object is a number or boolean
                            
                                Weighted choice short and simple [duplicate]
                            
                                Get .wav file length or duration
                            
                                Good ways to "expand" a numpy ndarray?
                            
                                Why do pythonistas call the current reference "self" and not "this"?
                            
                                Python argparse and controlling/overriding the exit status code
                            
                                Unable to import Python's email module at all
                            
                                is there a simple way to get group names of a user in django
                            
                                pip install - killed
                            
                                AttributeError: module 'cv2.cv2' has no attribute 'createLBPHFaceRecognizer'
                            
                                Is the += operator thread-safe in Python?
                            
                                How to stop Python parse_qs from parsing single values into lists?
                            
                                return SQL table as JSON in python
                            
                                What's the difference between "2*2" and "2**2" in Python?
                            
                                Debugger times out at "Collecting data..."
                            
                                Best way to retrieve variable values from a text file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I get href links from HTML using Python?

Tags:

python

html

href

hyperlink

beautifulsoup

user371012

People also ask

1 Answers

systempuntoout

Recent Activity

Donate For Us