Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write a python script to search a website html for matching links

Tags:

python

scrape

I am not too familiar with python and have to write a script to perform a host of functions. Basically the module i still need is how to check a website code for matching links provided beforehand.

like image 661
GeminiDNK Avatar asked Jan 23 '23 11:01

GeminiDNK


1 Answers

Matching links what? Their HREF attribute? The link display text? Perhaps something like:

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re
import urllib2

doc = urllib2.urlopen("http://somesite.com").read()
links = SoupStrainer('a', href=re.compile(r'^test'))
soup = [str(elm) for elm in BeautifulSoup(doc, parseOnlyThese=links)]
for elm in soup:
    print elm

That will grab the HTML content of somesite.com and then parse it using BeautifulSoup, looking only for links whose HREF attribute starts with "test". It then builds a list of these links and prints them out.

You can modify this to do anything using the documentation.

like image 76
Nick Presta Avatar answered Jan 29 '23 16:01

Nick Presta