Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

request.get(url) returns empty content

I am trying to figure this out, but had no luck:

import requests
r = requests.get('http://example.com/m7ppct4', allow_redirects=True)

r.status_code returns 200, and r.content returns ''.

r.headers returns the following dictionary:

{'content-length': '0', 
 'content-language': 'en-US', 
 'x-powered-by': 'Servlet/3.0', 
 'set-cookie': '__cfduid=d4b3d47d43189ac72be14b1d2a2bed98a1408989649815; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.azdoa.gov; HttpOnly, LWJSESSIONID=0000SESSIONMANAGEMENTAFFINI:18h1v85u3; Path=/; HttpOnly, NSC_batubufkpctWTTTM=ffffffff09f39f1545525d5f4f58455e445a4a42378b;expires=Mon, 25-Aug-2014 18:02:49 GMT;path=/;secure;httponly', 
 'expires': 'Thu, 01 Dec 1994 16:00:00 GMT', 
 'server': 'cloudflare-nginx', 
 'connection': 'keep-alive', 
 'x-ua-compatible': 'IE=EmulateIE9', 
 'cache-control': 'no-cache="set-cookie, set-cookie2"', 
 'date': 'Mon, 25 Aug 2014 18:00:49 GMT', 
 'cf-ray': '15f9b0ff50cf0d6d-LAX', 
 'content-type': 'application/octet-stream'}

When I open the page in a browser, I clearly get content.

Any thoughts on how I can proceed debugging this? I would like to get the page content with a requests.get() call.

like image 687
Tammo Heeren Avatar asked Aug 25 '14 18:08

Tammo Heeren


3 Answers

You must send any user agent:

import requests
r = requests.get('http://example.com/m7ppct4',  headers={'User-Agent':'test'})
like image 177
RaminNietzsche Avatar answered Oct 19 '22 10:10

RaminNietzsche


It looks like the website linked by tinyurl (azstatejobs) filters requests based on user-agents. Spoofing the Chrome user-agent worked for me:

import requests
url = 'http://tinyurl.com/m7ppct4'
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36'
headers = {'User-Agent': user_agent}
r = requests.get(url, headers=headers)

(allow_redirect is true by default)

You might want to try different user-agents and see what makes that website not like the python requests user-agent.

like image 20
Wiwiweb Avatar answered Oct 19 '22 09:10

Wiwiweb


import requests
import json
import pprint


r = requests.get('URL')
pprint.pprint(json.loads(r.content))
like image 40
Tangani Avatar answered Oct 19 '22 09:10

Tangani