I am having an issue with Unicode
with a variable contents when writing to a .pdf with python.
It's outputting this error:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013'
Which is it getting caught on an em dash basically.
I have tried taking that variable, where the contents has an 'em dash' and redefined it with an '.encode('utf-8')
' for example, i.e., below:
Body = msg.Body
BodyC = Body.encode('utf-8')
And now I get the below error:
Traceback (most recent call last):
File "script.py", line 37, in <module>
pdf.cell(200, 10, txt="Bod: " + BodyC, ln=4, align="C")
TypeError: can only concatenate str (not "bytes") to str
Below is my full code, how could I simply fix my Unicode error in 'Body
' variable contents.
Converting to utf-8
or western
, anything outside of 'latin-1
'. Any suggestions?
Full Code:
from fpdf import FPDF
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
msg = outlook.OpenSharedItem(r"C:\User\language\python\Msg-To-PDF\test_msg.msg")
print (msg.SenderName)
print (msg.SenderEmailAddress)
print (msg.SentOn)
print (msg.To)
print (msg.CC)
print (msg.BCC)
print (msg.Subject)
print (msg.Body)
SenderName = msg.SenderName
SenderEmailAddress = msg.SenderEmailAddress
SentOn = msg.SentOn
To = msg.To
CC = msg.CC
BCC = msg.BCC
Subject = msg.Subject
Body = msg.Body
BodyC = Body.encode('utf-8')
pdf = FPDF()
pdf.add_page()
# pdf.add_font('DejaVu', '', 'DejaVuSansCondensed.ttf', uni=True)
pdf.set_font("Helvetica", style = '', size = 11)
pdf.cell(200, 10, txt="From: " + SenderName, ln=1, align="C")
# pdf.cell(200, 10, border=SentOn, ln=1, align="C")
pdf.cell(200, 10, txt="To: " + To, ln=1, align="C")
pdf.cell(200, 10, txt="CC: " + CC, ln=1, align="C")
pdf.cell(200, 10, txt="BCC: " + BCC, ln=1, align="C")
pdf.cell(200, 10, txt="Subject: " + Subject, ln=1, align="C")
pdf.cell(200, 10, txt="Bod: " + BodyC, ln=4, align="C")
pdf.output("Sample.pdf")
'latin1'
?A workaround is to convert all text to latin-1 encoding before passing it on to the library. You can do that with the following command:
text2 = text.encode('latin-1', 'replace').decode('latin-1')
text2
will be free of any non-latin-1 characters. However, some chars may be replaced with ?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With