Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use BeautifulSoup to find all the links in a page pointing to a specific domain?

How can I use BeautifulSoup to find all the links in a page pointing to a specific domain?

like image 371
Juanjo Conti Avatar asked Jan 28 '10 00:01

Juanjo Conti


1 Answers

Use SoupStrainer,

from BeautifulSoup import BeautifulSoup, SoupStrainer
import re

# Find all links
links = SoupStrainer('a')
[tag for tag in BeautifulSoup(doc, parseOnlyThese=links)]

linkstodomain = SoupStrainer('a', href=re.compile('example.com/'))

Edit: Modified example from official doc.

like image 163
viksit Avatar answered Oct 13 '22 00:10

viksit