Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"SSL: certificate_verify_failed" error when scraping https://www.thenewboston.com/

So I started learning Python recently using "The New Boston's" videos on youtube, everything was going great until I got to his tutorial of making a simple web crawler. While I understood it with no problem, when I run the code I get errors all seemingly based around "SSL: CERTIFICATE_VERIFY_FAILED." I've been searching for an answer since last night trying to figure out how to fix it, it seems no one else in the comments on the video or on his website are having the same problem as me and even using someone elses code from his website I get the same results. I'll post the code from the one I got from the website as it's giving me the same error and the one I coded is a mess right now.

import requests from bs4 import BeautifulSoup  def trade_spider(max_pages):     page = 1     while page <= max_pages:         url = "https://www.thenewboston.com/forum/category.php?id=15&orderby=recent&page=" + str(page) #this is page of popular posts         source_code = requests.get(url)         # just get the code, no headers or anything         plain_text = source_code.text         # BeautifulSoup objects can be sorted through easy         for link in soup.findAll('a', {'class': 'index_singleListingTitles'}): #all links, which contains "" class='index_singleListingTitles' "" in it.             href = "https://www.thenewboston.com/" + link.get('href')             title = link.string # just the text, not the HTML             print(href)             print(title)             # get_single_item_data(href)     page += 1 trade_spider(1) 

The full error is: ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:645)

I apologize if this is a dumb question, I'm still new to programming but I seriously can't figure this out, I was thinking about just skipping this tutorial but it's bothering me not being able to fix this, thanks!

like image 805
Bill Jenkins Avatar asked Dec 29 '15 01:12

Bill Jenkins


People also ask

What causes SSL certificate errors Python?

SSL certificate_verify_failed errors typically occur as a result of outdated Python default certificates or invalid root certificates. If you're a website owner and you're receiving this error, it could be because you're not using a valid SSL certificate.

What is SSL certificate Python requests?

Requests verifies SSL certificates for HTTPS requests, just like a web browser. SSL Certificates are small data files that digitally bind a cryptographic key to an organization's details. Often, a website with a SSL certificate is termed as secure website.


1 Answers

The problem is not in your code but in the web site you are trying to access. When looking at the analysis by SSLLabs you will note:

This server's certificate chain is incomplete. Grade capped to B.

This means that the server configuration is wrong and that not only python but several others will have problems with this site. Some desktop browsers work around this configuration problem by trying to load the missing certificates from the internet or fill in with cached certificates. But other browsers or applications will fail too, similar to python.

To work around the broken server configuration you might explicitly extract the missing certificates and add them to you trust store. Or you might give the certificate as trust inside the verify argument. From the documentation:

You can pass verify the path to a CA_BUNDLE file or directory with certificates of trusted CAs:

>>> requests.get('https://github.com', verify='/path/to/certfile')  

This list of trusted CAs can also be specified through the REQUESTS_CA_BUNDLE environment variable.

like image 157
Steffen Ullrich Avatar answered Sep 17 '22 14:09

Steffen Ullrich