Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Newlines removed in POST request body? (Google App Engine)

I am building a REST API on Google App Engine (not using Endpoints) that will allow users to upload a CSV or tab-delimited file and search for potential duplicates. Since it's an API, I cannot use <form>s or the BlobStore's upload_url. I also cannot rely on having a single web client that will call this API. Instead, ideally, users will send the file in the body of the request.

My problem is, when I try to read the content of a tab-delimited file, I find that all newline characters have been removed, so there is no way of splitting the content into rows.

If I check the content of the file directly on the Python interpreter, I see that tabs and newlines are there (output is truncated in the example)

>>> with open('./data/occ_sample.txt') as o:
...     o.read()
... 
'id\ttype\tmodified\tlanguage\trights\n123456\tPhysicalObject\t2015-11-11 11:50:59.0\ten\thttp://creativecommons.org/licenses/by-nc/3.0\n...'

The RequestHandler logs the content of the request body:

import logging
class ReportApi(webapp2.RequestHandler):
    def post(self):
        logging.info(self.request.body)
        ...

So when I call the API running in the dev_appserver via curl

curl -X POST -d @data/occ_sample.txt http://localhost:8080/api/v0/report

This shows up in the logs:

id  type    modified    language    rights123456    PhysicalObject  2015-11-11 11:50:59.0   en  http://creativecommons.org/licenses/by-nc/3.0

As you can see, there is nothing between the last value of the headers and the first record (rights and 123456 respectively) and the same happens with the last value of each record and the first one of the next.

Am I missing something obvious here? I have tried loading the data with self.request.body, self.request.body_file and self.request.POST, and none seem to work. I also tried applying the Content-Type values text/csv, text/plain, application/csv in the request headers, with no success. Should I add a different Content-Type?

like image 723
JOT Avatar asked Oct 18 '25 12:10

JOT


1 Answers

You are using the wrong curl command-line option to send your file data, and it is this option that is stripping the newlines.

The -d option parses out your data and sends a application/x-www-form-urlencoded request, and it strips newlines. From the curl manpage:

-d, --data <data>

[...]

If you start the data with the letter @, the rest should be a file name to read the data from, or - if you want curl to read the data from stdin. Multiple files can also be specified. Posting data from a file named 'foobar' would thus be done with --data @foobar. When --data is told to read from a file like that, carriage returns and newlines will be stripped out.

Bold emphasis mine.

Use the --data-binary option instead:

--data-binary <data>

(HTTP) This posts data exactly as specified with no extra processing whatsoever.

If you start the data with the letter @, the rest should be a filename. Data is posted in a similar manner as --data-ascii does, except that newlines and carriage returns are preserved and conversions are never done.

You may want to include a Content-Type header in that case; of course this depends on your handler if you care about that header.

like image 152
Martijn Pieters Avatar answered Oct 21 '25 01:10

Martijn Pieters