I am using the Python API for Gmail. I am querying for some messages and retrieving them correctly, but the body of the messages looks like total nonsense, even when the MIME type it's said to be text/plain
or text/html
.
I have been searching all over the API docs, but they keep saying it's a string, when it obviously must be some encoding... I thought it could be base64
encoding, but trying to decode it with Python base64
gives me TypeError: Incorrect padding
, so either it's not base64
or I'm decoding badly.
I'd love to provide a good example, but since I'm handling sensitive information I'll have to obfuscate it a bit...
{
"payload": {
"mimeType": "multipart/mixed",
"filename": "",
"headers": [
...
],
"body": {
"size": 0
},
"parts": [
{
"mimeType": "multipart/alternative",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "multipart/alternative; boundary=001a1140b160adc309053bd7ec57"
}
],
"body": {
"size": 0
},
"parts": [
{
"partId": "0.0",
"mimeType": "text/plain",
"filename": "",
"headers": [
{
"name": "Content-Type",
"value": "text/plain; charset=UTF-8"
},
{
"name": "Content-Transfer-Encoding",
"value": "quoted-printable"
}
],
"body": {
"size": 4067,
"data": "LS0tLS0tLS0tLSBGb3J3YXJkZWQgbWVzc2FnZSAtLS0tLS0tLS0tDQpGcm9tOiBMaW5rZWRJbiA8am9iLWFwcHNAbGlua2VkaW4uY29tPg0KRGF0ZTogU2F0LCBTZXAgMywgMjAxNiBhdCA5OjMwIEFNDQpTdWJqZWN0OiBBcHBsaWNhdGlvbiBmb3IgU2VuaW9yIEJhY2tlbmQgRGV2ZWxvcG..."
}
The field that I'm talking about is payload.parts[0].parts[0].body.data
. I have truncated it at a random point, so I doubt is decodable like that, but you get the point... What is that encoding?
Also, wouldn't hurt to know where in the docs they explicitly say its base64 (unless it's the standard encoding for MIME?).
UPDATE: So in the end there was some bad luck involved. I have 5 mails like this, and turns out that the first one is malformed, for some unknown reason. After moving on to the other ones, I am able to decode all of them with the suggested approaches in the answers. Thank you all!
Important distinction, it is web safe base64 encoded (aka "base64url") . The docs are not very good on it, the MessagePartBody is best documented here: https://developers.google.com/gmail/api/v1/reference/users/messages/attachments
And it says the type is "bytes" (which obviously isn't save to transmit over JSON as-is), but I agree with you, it doesn't clearly specify it's base64url encoded like other "bytes" fields are in the API.
As for padding issues, is it because you're truncating? If not, check that "len(data) % 4 == 0", if not, it means the API is returning unpadded data, which would be unexpected.
This is base64.
Your truncated message is:
---------- Forwarded message ----------
From: LinkedIn <[email protected]>
Date: Sat, Sep 3, 2016 at 9:30 AM
Subject: Application for Senior Backend Develop
Here's some sample code:
I had to remove the last 3 characters from your truncated message because I was getting the same padding error as you. You probably have some garbage the message you're trying to decode.
import base64
body = "LS0tLS0tLS0tLSBGb3J3YXJkZWQgbWVzc2FnZSAtLS0tLS0tLS0tDQpGcm9tOiBMaW5rZWRJbiA8am9iLWFwcHNAbGlua2VkaW4uY29tPg0KRGF0ZTogU2F0LCBTZXAgMywgMjAxNiBhdCA5OjMwIEFNDQpTdWJqZWN0OiBBcHBsaWNhdGlvbiBmb3IgU2VuaW9yIEJhY2tlbmQgRGV2ZWxv"
result = base64.b64decode(body)
print(result)
Here's a snippet for gettting and decoding the message body. The decoding part was taken from the gMail API documentation:
message = service.users().messages().get(userId='me', id=msg_id, format='full').execute()
msg_str = base64.urlsafe_b64decode(message['payload']['body']['data'].encode('UTF8'))
mime_msg = email.message_from_string(msg_str)
print(msg_str)
Reference doc: https://developers.google.com/gmail/api/v1/reference/users/messages/get#python
The following worked for me:
base64.urlsafe_b64decode(body).decode("utf-8")
It's base64. You can use base64.decodestring to read it. The part of the message that your attached is: '---------- Forwarded message ----------\r\nFrom: LinkedIn <[email protected]>\r\nDate: Sat, Sep 3, 2016 at 9:30 AM\r\nSubject: Application for Senior Backend Develo'
The incorrect padding error means that you're decoding an incorrect number of characters. You're probably trying to decode a truncated message.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With