http://www.leboncoin.fr/montres_bijoux/671762293.htm
I'm trying to open this url
import requests
s = requests.Session()
s.headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36'
s.headers['Host'] = 'www.leboncoin.fr'
url = 'http://www.leboncoin.fr/montres_bijoux/671762293.htm'
r = s.get(url)
print r.text
when I run this script it shows this error, in my terminal,
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL /montres_bijoux/671762293.htm was not found on this server.</p>
</body></html>
while I can open same url in my browser and can see content.
What could be the issue??
Without even waiting for your test, I'm pretty confident I know what your bug is.
I put this url manually in function call it works fine but if I read that file and directly call function with that url, give me error. I have put 3-4 checks while reading file, url is perfectly coming form the file even I tried to print that url inside the called function and I'm receiving that url in function too. still have no clue what is happening ?
Most likely you're reading the URL with something like for line in file:
or file.readline
or some other function that preserves newlines. So, what you actually end up with is not this:
url = 'http://www.leboncoin.fr/montres_bijoux/671762293.htm'
… but this:
url = 'http://www.leboncoin.fr/montres_bijoux/671762293.htm\n'
The latter will be escaped by requests
into something that's a perfectly good URL for a resource that doesn't exist, hence the 404 error.
The best way to check this is to print repr(url)
instead of print(url)
. This will also find other possible problems, like embedded nonprintable characters. It won't find everything, like Unicode characters that look like .
but actually aren't, but it's a good first test. (And if that doesn't find it, for a second test, copy and paste from the output, quotes and all, into your test script.)
If this is the problem, the fix is simple:
url = url.rstrip()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With