I am using python 3.6.1 and I want to read in email files (.eml) for processing. I am using the emaildata 0.3.4 package, however whenever I try to import the Text class as in the documentation, I get the module errors:
import email
from email.text import Text
>>> ModuleNotFoundError: No module named 'cStringIO'
When I tried to correct using this update, I get the next error relating to mimetools
>>> ModuleNotFoundError: No module named 'mimetools'
Is it possible to use emaildata 0.3.4 with python 3.6 to parse .eml files? Or are there any other packages I can use to parse .eml files? Thanks
It is not compatible with python 3. Consider using the email package from the standard library. Thanks @Dmitri, I'll include an answer here using the email package for completeness. Show activity on this post. Using the email package, we can read in the .eml files. Then, use the BytesParser library to parse the file.
Using the email package, we can read in the .eml files. Then, use the BytesParser library to parse the file. Finally, use a plain preference (for plain text) with the get_body () method, and get_content () method to get the raw text of the email.
Then, use the BytesParser library to parse the file. Finally, use a plain preference (for plain text) with the get_body () method, and get_content () method to get the raw text of the email.
Using the email package, we can read in the .eml files. Then, use the BytesParser
library to parse the file. Finally, use a plain
preference (for plain text) with the get_body()
method, and get_content()
method to get the raw text of the email.
import email
from email import policy
from email.parser import BytesParser
import glob
file_list = glob.glob('*.eml') # returns list of files
with open(file_list[2], 'rb') as fp: # select a specific email file from the list
msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
print(text) # print the email content
>>> "Hi,
>>> This is an email
>>> Regards,
>>> Mister. E"
Granted, this is a simplified example - no mention of HTML or attachments. But it gets done essentially what the question asks and what I want to do.
Here is how you would iterate over several emails and save each as a plain text file:
file_list = glob.glob('*.eml') # returns list of files
for file in file_list:
with open(file, 'rb') as fp:
msg = BytesParser(policy=policy.default).parse(fp)
fnm = os.path.splitext(file)[0] + '.txt'
txt = msg.get_body(preferencelist=('plain')).get_content()
with open(fnm, 'w') as f:
print('Filename:', txt, file = f)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With