Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inconsistent behavior with HTTP POST requests in Python

Trying to make a POST request between a Python (WSGI) and a NodeJS + Express application. They are on different servers.

The problem is that when using different IP addresses (i.e. private network vs. public network), a urllib2 request on the public network succeeds, but the same request for the private network fails with a 502 Bad Gateway or URLError [32] Broken pipe.

The urllib2 code I'm using is this:

req = urllib2.Request(url, "{'some':'data'}", {'Content-Type' : 'application/json; charset=utf-8'})

res = urllib2.urlopen(req)

print f.read()

Now, I have also coded the request like this, using requests:

r = requests.post(url, headers = {'Content-Type' : 'application/json; charset=utf-8'}, data = "{'some':'data'}")

print r.text

And get a 200 OK response. This alternate method works for both networks.

I am interested in finding out if there is some additional configuration needed for a urllib2 request that I don't know of, or if I need to look into some network configuration which might be missing (I don't believe this is the case, since the alternate request method works, but I could definitely be wrong).

Any suggestions or pointers with this will be greatly appreciated. Thanks!

like image 896
Juan Carlos Coto Avatar asked Feb 04 '13 20:02

Juan Carlos Coto


1 Answers

The problem here is that, as Austin Phillips pointed out, urllib2.Request's constructor's data parameter:

may be a string specifying additional data to send to the server… data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format.

By passing it JSON-encoded data instead of urlencoded data, you're confusing it somewhere.

However, Request has a method add_data:

Set the Request data to data. This is ignored by all handlers except HTTP handlers — and there it should be a byte string, and will change the request to be POST rather than GET.

If you use this, you should probably also use add_header rather than passing it in the constructor, although that doesn't seem to be mentioned specifically anywhere in the documentation.

So, this should work:

req = urllib2.Request(url)
req.add_data("{'some':'data'}")
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)

In a comment, you said:

The reason I don't want to just switch over to requests without finding out why I'm seeing this problem is that there may be some deeper underlying issue that this points to that could come back and cause harder-to-detect problems later on.

If you want to find deep underlying issues, you're not going to do that by just looking at your client-side source. The first step to figuring out "Why does X work but Y fails?" with network code is to figure out exactly what bytes X and Y each send. Then you can try to narrow down what the relevant difference is, and then figure out what part of your code is causing Y to send the wrong data in the relevant place.

You can do this by logging things at the service (if you control it), running Wireshark, etc., but the easiest way, for simple cases, is netcat. You'll need to read man nc for your system (and, on Windows, you'll need to get and install netcat before you can run it), because the syntax is different for each version, but it's always something simple like nc -kl 12345.

Then, in your client, change the URL to use localhost:12345 in place of the hostname, and it'll connect up to netcat and send its HTTP request, which will be dumped to the terminal. You can then copy that and use nc HOST 80 and paste it to see how the real server responds, and use that to narrow down where the problem is. Or, if you get stuck, at least you can copy and paste the data to your SO question.


One last thing: This is almost certainly not relevant to your problem (because you're sending the exact same data with requests and it's working), but your data is not actually valid JSON, because it uses single quotes instead of double quotes. According to the docs, string is defined as:

string
    ""
    " chars "

(The docs have a nice graphical representation as well.)

In general, except for really simple test cases, you don't want to write JSON by hand. In many cases (including yours), all you have to do is replace the "…" with json.dumps(…), so this isn't a serious hardship. So:

req = urllib2.Request(url)
req.add_data(json.dumps({'some':'data'}))
req.add_header('Content-Type', 'application/json; charset=utf-8')
res = urllib2.urlopen(req)

So, why is it working? Well, in JavaScript, single-quoted strings are legal, as well as other things like backslash escapes that aren't valid in JSON, and any JS code that uses restricted-eval (or, worse, raw eval) for parsing will accept it. And, because so many people got used to writing bad JSON because of this, many browsers' native JSON parsers and many JSON libraries in other languages have workarounds to allow common errors. But you shouldn't rely on that.

like image 132
abarnert Avatar answered Oct 31 '22 19:10

abarnert