I am working on setting up a script that forwards incoming mail to a list of recipients.
Here's what I have now:
I read the email from stdin (that's how postfix passes it):
email_in = sys.stdin.read()
incoming = Parser().parse(email_in)
sender = incoming['from']
this_address = incoming['to']
I test for multipart:
if incoming.is_multipart():
for payload in incoming.get_payload():
# if payload.is_multipart(): ...
body = payload.get_payload()
else:
body = incoming.get_payload(decode=True)`
I set up the outgoing message:
msg = MIMEMultipart()
msg['Subject'] = incoming['subject']
msg['From'] = this_address
msg['reply-to'] = sender
msg['To'] = "[email protected]"
msg.attach(MIMEText(body.encode('utf-8'), 'html', _charset='UTF-8'))
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
This works pretty well with ASCII characters (English text), forwards it and all.
When I send non-ascii characters though, it gives back gibberish (depending on email client bytes or ascii representations of the utf-8 chars)
What can be the problem? Is it on the incoming or the outgoing side?
UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the '8' means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.)
email.encoders. encode_base64 (msg) Encodes the payload into base64 form and sets the Content-Transfer-Encoding header to base64 .
The default encoding of Python source files is UTF-8. JSON, TOML, YAML use UTF-8. Most text editors, including Visual Studio Code and Windows Notepad use UTF-8 by default. Most websites and text data on the internet use UTF-8. And many other popular programming languages, including Node.
A character set is a set of valid characters acceptable by a programming language in scripting. In this case, we are talking about the Python programming language. So, the Python character set is a valid set of characters recognized by the Python language.
The problem is that many email clients (including Gmail) send non-ascii emails in base64. stdin
on the other hand passes everything into a string. If you parse that with Parser.parse()
, it returns a string type with base64 inside.
Instead the optional decode
argument should be used on the get_payload()
method. When that is set, the method returns a bytes type. After that you can use the builtin decode()
method to get utf-8 string like so:
body = payload.get_payload(decode=True)
body = body.decode('utf-8')
There is great insight into utf-8 and python in Ned Batchelder's talk.
My final code works a bit differently, you can check that, too here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With