Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Find specific link w/ beautifulsoup

Tags:

python

regex

beautifulsoup

Hi I cannot figure out how to find links which begin with certain text for the life of me. findall('a') works fine, but it's way too much. I just want to make a list of all links that begin with http://www.nhl.com/ice/boxscore.htm?id=

Can anyone help me?

Thank you very much

like image

559

asked Oct 11 '11 21:10

Jen Scott

People also ask

What is Find () method in BeautifulSoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

1 Answers

First set up a test document and open up the parser with BeautifulSoup:

>>> from BeautifulSoup import BeautifulSoup
>>> doc = '<html><body><div><a href="something">yep</a></div><div><a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a></div><a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a></body></html>'
>>> soup = BeautifulSoup(doc)
>>> print soup.prettify()
<html>
 <body>
  <div>
   <a href="something">
    yep
   </a>
  </div>
  <div>
   <a href="http://www.nhl.com/ice/boxscore.htm?id=3">
    somelink
   </a>
  </div>
  <a href="http://www.nhl.com/ice/boxscore.htm?id=7">
   another
  </a>
 </body>
</html>

Next, we can search for all <a> tags with an href attribute starting with http://www.nhl.com/ice/boxscore.htm?id=. You can use a regular expression for it:

>>> import re
>>> soup.findAll('a', href=re.compile('^http://www.nhl.com/ice/boxscore.htm\?id='))
[<a href="http://www.nhl.com/ice/boxscore.htm?id=3">somelink</a>, <a href="http://www.nhl.com/ice/boxscore.htm?id=7">another</a>]

like image

131

answered Oct 10 '22 04:10

jterrace

Sign in to Comment

Related questions
                            
                                Is ctime always <= mtime?
                            
                                Module subprocess has no attribute 'STARTF_USESHOWWINDOW'
                            
                                Problem with multi threaded Python app and socket connections
                            
                                Python doctest: result with multiple lines
                            
                                How to export std::vector
                            
                                Python, logging: use custom handler with dictionary configuration?
                            
                                Reading multiple Python pickled data at once, buffering and newlines?
                            
                                How do you change the SQL isolation level from Python using MySQLdb?
                            
                                Is there a way to specify the build directory for py2exe
                            
                                Trouble activating virtualenv on server via Fabric
                            
                                Issues trying to SSH into a fresh EC2 instance with Paramiko
                            
                                How to get a win32 handle of an open file in python?
                            
                                Error "The object invoked has disconnected from its clients" - automate IE 8 with python and win32com
                            
                                os.path equivalent for web urls in python?
                            
                                Python For Loop Slowing With Time
                            
                                Intensity normalization of image using Python+PIL - Speed issues
                            
                                Why cannot pass print function to dir() in python?
                            
                                python reading text file
                            
                                Create kml from csv in Python
                            
                                How to get arguments list of a built-in Python class constructor?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With