Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading .eml files with Python 3.6 using emaildata 0.3.4

I am using python 3.6.1 and I want to read in email files (.eml) for processing. I am using the emaildata 0.3.4 package, however whenever I try to import the Text class as in the documentation, I get the module errors:

import email
from email.text import Text
>>> ModuleNotFoundError: No module named 'cStringIO'

When I tried to correct using this update, I get the next error relating to mimetools

>>> ModuleNotFoundError: No module named 'mimetools'

Is it possible to use emaildata 0.3.4 with python 3.6 to parse .eml files? Or are there any other packages I can use to parse .eml files? Thanks

like image 429
PyRsquared Avatar asked Aug 14 '17 16:08

PyRsquared


People also ask

Is it possible to read eml file in Python 3?

It is not compatible with python 3. Consider using the email package from the standard library. Thanks @Dmitri, I'll include an answer here using the email package for completeness. Show activity on this post. Using the email package, we can read in the .eml files. Then, use the BytesParser library to parse the file.

How do I get the content of an email in Python?

Using the email package, we can read in the .eml files. Then, use the BytesParser library to parse the file. Finally, use a plain preference (for plain text) with the get_body () method, and get_content () method to get the raw text of the email.

How to get the raw text of an email in Python?

Then, use the BytesParser library to parse the file. Finally, use a plain preference (for plain text) with the get_body () method, and get_content () method to get the raw text of the email.


1 Answers

Using the email package, we can read in the .eml files. Then, use the BytesParser library to parse the file. Finally, use a plain preference (for plain text) with the get_body() method, and get_content() method to get the raw text of the email.

import email
from email import policy
from email.parser import BytesParser
import glob
file_list = glob.glob('*.eml') # returns list of files
with open(file_list[2], 'rb') as fp:  # select a specific email file from the list
    msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
print(text)  # print the email content
>>> "Hi,
>>> This is an email
>>> Regards,
>>> Mister. E"

Granted, this is a simplified example - no mention of HTML or attachments. But it gets done essentially what the question asks and what I want to do.

Here is how you would iterate over several emails and save each as a plain text file:

file_list = glob.glob('*.eml') # returns list of files
for file in file_list:
    with open(file, 'rb') as fp:
        msg = BytesParser(policy=policy.default).parse(fp)
        fnm = os.path.splitext(file)[0] + '.txt'
        txt = msg.get_body(preferencelist=('plain')).get_content()
        with open(fnm, 'w') as f:
            print('Filename:', txt, file = f) 
like image 79
PyRsquared Avatar answered Oct 07 '22 14:10

PyRsquared