I need to retrieve .zip archives, retrieve the file within the .zip and extract its data. The .zip archives are attached to email message files; I am not using a mail protocol to access the mailbox. I am able to parse the messages...
...
from email.parser import Parser
...
for fileName in os.listdir(mailDir):
...
message = Parser().parse(open(mailDir + '/' + fileName, 'r'))
...
for part in message.walk():
if part.get_content_type() == 'application/octet-stream':
When I first started writing this code I was testing against an email that had a .csv attached and didn't have problems accessing the attachment and pulling the data out but now that I'm working against emails with .zip (containing the previously used .csv) I'm stuck. Added...
import zipfile
...but it seems I need to actually save the attached .zip to the filesystem to be able to use zipfile. I'd rather not do this and thought (hoped) I could simply use...
zipfile.ZipFile(the_base64_string_from_message, 'r')
but that failed. How can I access the archive without creating a .zip archive in the filesystem? Additionally, maybe I shouldn't even be using the email module (only used it so I could easily find the attachment)???
What you are probably looking for is the StringIO module, which wraps up a string to give it the interface of a file. Also, you need to decode the email attachment payload from base64 so that you are dealing with the correct bytes. Here is an example which unzips the attachment into the current working directory:
import email
import zipfile
from cStringIO import StringIO
import base64
with open('some_email_with_zip.eml', 'r') as f:
m = email.message_from_file(f)
for part in m.walk():
# You might also check to see if the content-type for your zip files is
# application/zip instead of application/octet-stream
if part.get_content_type() == 'application/zip':
zip_bytes = base64.b64decode(part.get_payload())
file_wrapper = StringIO(zip_bytes)
if zipfile.is_zipfile(file_wrapper):
with zipfile.ZipFile(file_wrapper, 'r') as zf:
zf.extractall()
If you want to specify a different path than the current directory for the unzipped files, you can specify that as a parameter to extractall():
zf.extractall('/path/for/unzipped/files')
StringIO was the magic I was missing; here's the solution...
import base64, StringIO, zipfile
# base64 string from the message
attachment = '...'
attachment = base64.b64decode(attachment)
attachment = StringIO.StringIO(attachment)
zipFile = zipfile.ZipFile(attachment, 'r')
Yields a zipfile.ZipFile instance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With