Some mail clients, don't set the References
headers, but Thread-Index
.
Is there a way to parse this header in Python?
Related: How does the email header field 'thread-index' work?
Mail 1
Date: Tue, 2 Dec 2014 08:21:00 +0000
Thread-Index: AdAOBz5QJ/JuQSJMQTmSQ8+dVs2IDg==
Mail 2 (Which is related to Mail 1)
Date: Mon, 8 Dec 2014 13:12:13 +0000
Thread-Index: AdAOBz5QJ/JuQSJMQTmSQ8+dVs2IDgE4StZw
Update
I want to be able to link these two mails in my application. It already works perfectly for the well known References
and In-Reply-To
headers.
Using the info here, I was able to put the following together:
import struct, datetime
def parse_thread_index(index):
s = index.decode('base64')
guid = struct.unpack('>IHHQ', s[6:22])
guid = '{%08X-%04X-%04X-%04X-%12X}' % (guid[0], guid[1], guid[2], (guid[3] >> 48) & 0xFFFF, guid[3] & 0xFFFFFFFFFFFF)
f = struct.unpack('>Q', s[:6] + '\0\0')[0]
ts = [datetime.datetime(1601, 1, 1) + datetime.timedelta(microseconds=f//10)]
for n in range(22, len(s), 5):
f = struct.unpack('>I', s[n:n+4])[0]
ts.append(ts[-1] + datetime.timedelta(microseconds=(f<<18)//10))
return guid, ts
Given a thread index, it returns a tuple (guid, [list of dates])
. For your test data, the result is:
> parse_thread_index('AdAOBz5QJ/JuQSJMQTmSQ8+dVs2IDgE4StZw')
('{27F26E41-224C-4139-9243-CF9D56CD880E}', [datetime.datetime(2014, 12, 2, 8, 9, 6, 673459), datetime.datetime(2014, 12, 8, 13, 11, 0, 807475)])
I don't have enough test data at hand, so this code might be buggy. Feel free to let me know.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With