Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing Thread-Index Mail Header with Python

Some mail clients, don't set the References headers, but Thread-Index.

Is there a way to parse this header in Python?

Related: How does the email header field 'thread-index' work?

Mail 1

Date: Tue, 2 Dec 2014 08:21:00 +0000
Thread-Index: AdAOBz5QJ/JuQSJMQTmSQ8+dVs2IDg==

Mail 2 (Which is related to Mail 1)

Date: Mon, 8 Dec 2014 13:12:13 +0000
Thread-Index: AdAOBz5QJ/JuQSJMQTmSQ8+dVs2IDgE4StZw

Update

I want to be able to link these two mails in my application. It already works perfectly for the well known References and In-Reply-To headers.

like image 921
guettli Avatar asked Dec 09 '14 08:12

guettli


1 Answers

Using the info here, I was able to put the following together:

import struct, datetime

def parse_thread_index(index):

    s = index.decode('base64')

    guid = struct.unpack('>IHHQ', s[6:22])
    guid = '{%08X-%04X-%04X-%04X-%12X}' % (guid[0], guid[1], guid[2], (guid[3] >> 48) & 0xFFFF, guid[3] & 0xFFFFFFFFFFFF)

    f = struct.unpack('>Q', s[:6] + '\0\0')[0]
    ts = [datetime.datetime(1601, 1, 1) + datetime.timedelta(microseconds=f//10)]

    for n in range(22, len(s), 5):
        f = struct.unpack('>I', s[n:n+4])[0]
        ts.append(ts[-1] + datetime.timedelta(microseconds=(f<<18)//10))

    return guid, ts

Given a thread index, it returns a tuple (guid, [list of dates]). For your test data, the result is:

 > parse_thread_index('AdAOBz5QJ/JuQSJMQTmSQ8+dVs2IDgE4StZw')
('{27F26E41-224C-4139-9243-CF9D56CD880E}', [datetime.datetime(2014, 12, 2, 8, 9, 6, 673459), datetime.datetime(2014, 12, 8, 13, 11, 0, 807475)])

I don't have enough test data at hand, so this code might be buggy. Feel free to let me know.

like image 133
georg Avatar answered Sep 20 '22 19:09

georg