Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping for href links

Trying to collect the specific link on this page with the correct keywords, so far I have:

from bs4 import BeautifulSoup
import random
url = 'http://www.thenextdoor.fr/en/4_adidas-originals'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')
raw = soup.findAll('a', {'class':'add_to_compare'})
links = raw['href']
keyword1 = 'adidas'
keyword2 = 'thenextdoor'
keyword3 = 'uncaged'
for link in links:
    text = link.text
    if keyword1 in text and keyword2 in text and keyword3 in text:

Im trying to extract this link

like image 291
ColeWorld Avatar asked Mar 21 '26 04:03

ColeWorld


1 Answers

You can check if all are present with all() and if either 1 is present with any()

from bs4 import BeautifulSoup
import requests

res = requests.get("http://www.thenextdoor.fr/en/4_adidas-originals").content
soup = BeautifulSoup(res)

atags = soup.find_all('a', {'class':'add_to_compare'})
links = [atag['href'] for atag in atags]
keywords = ['adidas', 'thenextdoor', 'Uncaged']

for link in links:  
    if all(keyword in link for keyword in keywords):
        print link

Output:

http://www.thenextdoor.fr/en/clothing/2042-adidas-originals-Ultraboost-Uncaged-2303002052017.html
http://www.thenextdoor.fr/en/clothing/2042-adidas-originals-Ultraboost-Uncaged-2303002052017.html
like image 178
Mohammad Yusuf Avatar answered Mar 23 '26 18:03

Mohammad Yusuf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!