Sorry if this is a duplicate i've been looking for answers for about an hour and can't seem to find any. Anyways, I have a text file full of urls and i want to check each one to see whether it exist or not. I need some help understanding the error message and if there are any ways to fix it or different methods i can use.
Here's my code
import requests
filepath = 'url.txt'
with open(filepath) as fp:
url = fp.readline()
count = 1
while count != 677: #Runs through each line of my txt file
print(url)
request = requests.get(url) #Here is where im getting the error
if request.status_code == 200:
print('Web site exists')
else:
print('Web site does not exist')
url = url.strip()
count += 1
And this is the output
http://www.pastaia.co
Traceback (most recent call last):
File "python", line 9, in <module>
requests.exceptions.ConnectionError:
HTTPConnectionPool(host='www.pastaia.co%0a', port=80): Max retries exceeded
with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection
object at 0x7fca82769e10>: Failed to establish a new connection: [Errno -2]
Name or service not known',))
I'll throw in ideas to get you started, whole careers are built around spidering :) btw, http://www.pastaia.co seems to just be down. And that's a big part of the trick, how to handle the unexpected when crawling the web. Ready? Here we go...
import requests
filepath = 'url.txt'
with open(filepath) as fp:
for url in fp:
print(url)
try:
request = requests.get(url) #Here is where im getting the error
if request.status_code == 200:
print('Web site exists')
except:
print('Web site does not exist')
for loop, you just want to loop the whole file right?try and except that way if it blows up for whatever reason of which there can be lots like, bad DNS, non 200 returned, perhaps it's a .pdf page, the web is the wild wild west. This way the code won't crash and you can check the next site in the list and just record the error however you'd like.response code 200 doesn't always mean the page is valid, just that the site returned success, but it's a good place to start.user-agent to your request, you may want to mimic a browser, or perhaps have your program identify itself as super bot 9000beautifulsoup: https://www.crummy.com/software/BeautifulSoup/If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With