Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the href text of a link that has a certain class attribute using BeautifulSoup in Python

How do I get just the text from a href in an anchor tag that matches a class. So if I have

<a href="Link_I_Need.html" class="Unique_Class_Name">link text</a>

how can I get the string Link_I_Need.html from only the anchor tag with the class Unique_Class_Name?

like image 877
ddschmitz Avatar asked Feb 15 '16 18:02

ddschmitz


2 Answers

Use the .find() or .find_all() method in order to select element(s) that have a href attribute and a class attribute of Unique_Class_Name. Then iterate over the elements and access the href attribute value:

soup = BeautifulSoup(html)
anchors = soup.find_all('a', {'class': 'Unique_Class_Name', 'href': True})

for anchor in anchors:
    print (anchor['href'])

You could alternatively use a basic CSS selector with the .select() method:

soup = BeautifulSoup(html)

for anchor in soup.select('a.Unique_Class_Name'):
    if anchor.has_attr('href'):
        print (anchor['href'])
like image 52
Josh Crozier Avatar answered Oct 20 '22 00:10

Josh Crozier


<a class="blueText" href="/info/046386294000000899/?s_bid=046386294000000899&amp;s_sid=FSP-LSR-002&amp;s_fr=V01&amp;s_ck=C01" target="_blank">川村商店</a>

You can get the only text like this

for url in url_list:
    res = requests.get('%s' % url)
    soup = bs4.BeautifulSoup(res.text, "html.parser")
    for p in soup.find_all('a', class_='blueText'):
        print(p.text) 
like image 28
Ryosuke Hujisawa Avatar answered Oct 19 '22 23:10

Ryosuke Hujisawa