Using a very basic program to search up a query on a website and print out the search results, why do I get a 502 error?
import requests
from bs4 import BeautifulSoup
import re
def main():
url = "https://www.last10k.com/Search"
dat = {'q':'goog'}
resp = requests.get(url, params=dat)
print(resp.content)
Define a User-Agent
header. Like this:
import requests
def main():
url = "https://www.last10k.com/Search"
dat = {'q':'goog'}
resp = requests.get(url, params=dat, headers={'User-Agent': 'Mozilla/5.0'})
print(resp.status_code)
Why of this requirement? Wikimedia User-Agent policy
I had this problem and found that a mix of looking at the content and trying the request with a browser helped me find the solution. Maybe it will help you, too, so here is what I did:
My request was successful with a browser, then failed with python. The URLs were the same. So I used the debugger. You can also simply print stuff, but the debugger shows what all is to see and lets you explore what you otherwise have missed. And I found that the response content on the failed python-request was an error-message which googles to be a ruby-problem.
So there was some different behavior on the remote-side, but what causes it? Adding a User-Agent-header, as suggested, was nice, but did not change anything. So I looked at the other headers and found that the Basic Authentication string looked completely different.
My solution: I fed the python request with the wrong auth-data due to some reafactoring I made and the remote side was processing "permission denied" results somehow badly, which ended up in a 502 instead of a 403.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With