Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I batch send a multipart html post with multiple urls?

I am speaking to the gmail api and would like to batch the requests. They have a friendly guide for this here, https://developers.google.com/gmail/api/guides/batch, which suggests that I should be able to use multipart/mixed and include different urls.

I am using Python and the Requests library, but am unsure how to issue different urls. Answers like this one How to send a "multipart/form-data" with requests in python? don't mention an option for changing that part.

How do I do this?

like image 870
user592419 Avatar asked Dec 15 '22 17:12

user592419


1 Answers

Unfortunately, requests does not support multipart/mixed in their API. This has been suggested in several GitHub issues (#935 and #1081), but there are no updates on this for now. This also becomes quite clear if you search for "mixed" in the requests sources and get zero results :(

Now you have several options, depending on how much you want to make use of Python and 3rd-party libraries.

Google API Client

Now, the most obvious answer to this problem is to use the official Python API that Google is providing here. It comes with a HttpBatchRequest class that can handle the batch requests that you need. This is documented in detail in this guide.

Essentially, you create an HttpBatchRequest object and add all your requests to it. The library will then put everything together (taken from the guide above):

batch = BatchHttpRequest()
batch.add(service.animals().list(), callback=list_animals)
batch.add(service.farmers().list(), callback=list_farmers)
batch.execute(http=http)

Now, if for whatever reason you cannot or will not use the official Google libraries you will have to build parts of the request body yourself.

requests + email.mime

As I already mentioned, requests does not officially support multipart/mixed. But that does not mean that we cannot "force" it. When creating a Request object, we can use the files parameter to provide multipart data.

files is a dictionary that accepts 4-tuple values of this format: (filename, file_object, content_type, headers). The filename can be empty. Now we need to convert a Request object into a file(-like) object. I wrote a small method that covers the basic examples from the Google example. It is partly inspired by the internal methods that Google uses in their Python library:

import requests
from email.mime.multipart import MIMEMultipart
from email.mime.nonmultipart import MIMENonMultipart

BASE_URL = 'http://www.googleapis.com/batch'

def serialize_request(request):
    '''Returns the string representation of the request'''
    mime_body = ''

    prepared = request.prepare()

    # write first line (method + uri)
    if request.url.startswith(BASE_URL):
        mime_body = '%s %s\r\n' % (request.method, request.url[len(BASE_URL):])
    else:
        mime_body = '%s %s\r\n' % (request.method, request.url)

    part = MIMENonMultipart('application', 'http')

    # write headers (if possible)
    for key, value in prepared.headers.iteritems():
        mime_body += '%s: %s\r\n' % (key, value)

    if getattr(prepared, 'body', None) is not None:
        mime_body += '\r\n' + prepared.body + '\r\n'

    return mime_body.encode('utf-8').lstrip()

This method will transform a requests.Request object into a UTF-8 encoded string that can later be used a a payload for a MIMENonMultipart object, i.e. the different multiparts.

Now in order to generate the actual batch request, we first need to squeeze a list of (Google API) requests into a files dictionary for the requests lib. The following method will take a list of requests.Request objects, transform each into a MIMENonMultipart and then return a dictionary that complies to the structure of the files dictionary:

import uuid

def prepare_requests(request_list):
    message = MIMEMultipart('mixed')
    output = {}

    # thanks, Google. (Prevents the writing of MIME headers we dont need)
    setattr(message, '_write_headers', lambda self: None)

    for request in request_list:
        message_id = new_id()
        sub_message = MIMENonMultipart('application', 'http')
        sub_message['Content-ID'] = message_id
        del sub_message['MIME-Version']

        sub_message.set_payload(serialize_request(request))

        # remove first line (from ...)
        sub_message = str(sub_message)
        sub_message = sub_message[sub_message.find('\n'):]

        output[message_id] = ('', str(sub_message), 'application/http', {})

    return output

def new_id():
    # I am not sure how these work exactly, so you will have to adapt this code
    return '<item%s:[email protected]>' % str(uuid.uuid4())[-4:]

Finally, we need to change the Content-Type from multipart/form-data to multipart/mixed and also remove the Content-Disposition and Content-Type headers from each request part. These we generated by requests and cannot be overwritten by the files dictionary.

import re

def finalize_request(prepared):
    # change to multipart/mixed
    old = prepared.headers['Content-Type']
    prepared.headers['Content-Type'] = old.replace('multipart/form-data', 'multipart/mixed')

    # remove headers at the start of each boundary
    prepared.body = re.sub(r'\r\nContent-Disposition: form-data; name=.+\r\nContent-Type: application/http\r\n', '', prepared.body)

I have tried my best to test this with the Google Example from the Batching guide:

sheep = {
  "animalName": "sheep",
  "animalAge": "5",
  "peltColor": "green"
}

commands = []
commands.append(requests.Request('GET', 'http://www.googleapis.com/batch/farm/v1/animals/pony'))
commands.append(requests.Request('PUT', 'http://www.googleapis.com/batch/farm/v1/animals/sheep', json=sheep, headers={'If-Match': '"etag/sheep"'}))
commands.append(requests.Request('GET', 'http://www.googleapis.com/batch/farm/v1/animals', headers={'If-None-Match': '"etag/animals"'}))

files = prepare_requests(commands)

r = requests.Request('POST', 'http://www.googleapis.com/batch', files=files)
prepared = r.prepare()

finalize_request(prepared)

s = requests.Session()
s.send(prepared)

And the resulting request should be close enough to what Google is providing in their Batching guide:

POST http://www.googleapis.com/batch
Content-Length: 1006
Content-Type: multipart/mixed; boundary=a21beebd15b74be89539b137bbbc7293

--a21beebd15b74be89539b137bbbc7293

Content-Type: application/http
Content-ID: <item8065:[email protected]>

GET /farm/v1/animals
If-None-Match: "etag/animals"

--a21beebd15b74be89539b137bbbc7293

Content-Type: application/http
Content-ID: <item5158:[email protected]>

GET /farm/v1/animals/pony

--a21beebd15b74be89539b137bbbc7293

Content-Type: application/http
Content-ID: <item0ec9:[email protected]>

PUT /farm/v1/animals/sheep
Content-Length: 63
Content-Type: application/json
If-Match: "etag/sheep"

{"animalAge": "5", "animalName": "sheep", "peltColor": "green"}

--a21beebd15b74be89539b137bbbc7293--

In the end, I highly recommend the official Google library but if you cannot use it, you will have to improvise a bit :)

Disclaimer: I havent actually tried to send this request to the Google API Endpoints because the authentication procedure is too much of a hassle. I was just trying to get as close as possible to the HTTP request that is described in the Batching guide. There might be some problems with \r and \n line endings, depending on how strict the Google Endpoints are.

Sources:

  • requests github (especially issues #935 and #1081)
  • requests API documentation
  • Google APIs for Python
like image 199
Timo D Avatar answered Jan 04 '23 03:01

Timo D