Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2.7 - Extract Zip From Email Message File

I need to retrieve .zip archives, retrieve the file within the .zip and extract its data. The .zip archives are attached to email message files; I am not using a mail protocol to access the mailbox. I am able to parse the messages...

...
from email.parser import Parser
...
for fileName in os.listdir(mailDir):
    ...
    message = Parser().parse(open(mailDir + '/' + fileName, 'r'))
    ...
    for part in message.walk():
        if part.get_content_type() == 'application/octet-stream':

When I first started writing this code I was testing against an email that had a .csv attached and didn't have problems accessing the attachment and pulling the data out but now that I'm working against emails with .zip (containing the previously used .csv) I'm stuck. Added...

import zipfile

...but it seems I need to actually save the attached .zip to the filesystem to be able to use zipfile. I'd rather not do this and thought (hoped) I could simply use...

zipfile.ZipFile(the_base64_string_from_message, 'r')

but that failed. How can I access the archive without creating a .zip archive in the filesystem? Additionally, maybe I shouldn't even be using the email module (only used it so I could easily find the attachment)???

like image 917
user1801810 Avatar asked Oct 21 '13 21:10

user1801810


2 Answers

What you are probably looking for is the StringIO module, which wraps up a string to give it the interface of a file. Also, you need to decode the email attachment payload from base64 so that you are dealing with the correct bytes. Here is an example which unzips the attachment into the current working directory:

import email
import zipfile
from cStringIO import StringIO
import base64

with open('some_email_with_zip.eml', 'r') as f:
    m = email.message_from_file(f)

for part in m.walk():
    # You might also check to see if the content-type for your zip files is
    # application/zip instead of application/octet-stream
    if part.get_content_type() == 'application/zip':
        zip_bytes = base64.b64decode(part.get_payload())
        file_wrapper = StringIO(zip_bytes)
        if zipfile.is_zipfile(file_wrapper):
            with zipfile.ZipFile(file_wrapper, 'r') as zf:
                zf.extractall()

If you want to specify a different path than the current directory for the unzipped files, you can specify that as a parameter to extractall():

zf.extractall('/path/for/unzipped/files')
like image 138
Christian Abbott Avatar answered Oct 20 '22 00:10

Christian Abbott


StringIO was the magic I was missing; here's the solution...

import base64, StringIO, zipfile

# base64 string from the message
attachment = '...'
attachment = base64.b64decode(attachment)
attachment = StringIO.StringIO(attachment)

zipFile = zipfile.ZipFile(attachment, 'r')

Yields a zipfile.ZipFile instance.

like image 43
user1801810 Avatar answered Oct 20 '22 01:10

user1801810