Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>

I'm trying to make a script that gets data out from an sqlite3 database, but I have run in to a problem.

The field in the database is of type text and the contains a html formated text. see the text below

<html> <head> <title>Yahoo!</title> </head> <body> <style type="text/css"> html {} .yshortcuts {border-bottom:none !important;} .ReadMsgBody {width:100%;} .ExternalClass{width:100%;} </style> <table cellpadding="0" cellspacing="0" bgcolor="#ffffff">     <tr> <td width="550" valign="top" align="left">      <table cellpadding="0" cellspacing="0" width="500">         <tr>             <td colspan="3"><img        src="http://mail.yimg.com/nq/assets/sharedmessages/v1/us/logo.gif" width="292" height="51" style="display:block;" border="0" alt="Yahoo! Mail"></td>         </tr>         <tr>             <td rowspan="3" width="1" bgcolor="#c7c4ca"></td>             <td width="498" height="1" bgcolor="#c7c4ca"></td>             <td rowspan="3" width="1" bgcolor="#c7c4ca"></td>         </tr>         <tr>             <td width="498" valign="top" align="left">             <table cellpadding="0" cellspacing="0">                 <tr>                     <td width="498" bgcolor="#61399d" align="left" valign="top">                     <table cellspacing="0" cellpadding="0"><tr><td height="24"></td></tr></table>                     <div style="font-family:Arial, Helvetica, sans-serif;font-size:23px;line-height:27px;margin-bottom:10px;color:#ffffff;margin-left:15px;"><span style="color:#ffffff;text-decoration:none;font-weight:bold;line-height:27px;">Välkommen till Yahoo! Mail.</span></div>                     <div style="font-family:Arial, Helvetica, sans-serif;font-size:22px;line-height:26px;margin-bottom:1px;color:#ffffff;margin-left:15px;margin-bottom:7px;margin-right:15px;">Ansluta och dela går snabbt och enkelt och är tillgängligt överallt.</div>                     </td>                 </tr>                 <tr>                     <td><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/b1.gif" width="498" height="18" style="display:block;" border="0"></td>                 </tr>             </table>             <table cellpadding="0" cellspacing="0" width="498">                 <tr>                     <td width="292" valign="top">                     <table cellpadding="0" cellspacing="0">                         <tr>                             <td><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/grad.gif" width="292" height="9" style="display:block;"></td>                         </tr>                         <tr>                             <td width="292" bgcolor="#ffffff" align="left" valign="top">                             <table cellspacing="0" cellpadding="0"><tr><td height="11"></td></tr></table>                             <div style="margin-left:15px;">                                                   <div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:18px;color:#333333;margin-bottom:11px;font-weight:bold;">Det är lätt som en plätt att komma igång.</div>                                 <table cellpadding="0" cellspacing="0" width="267">                                     <tr>                                         <td width="16" align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">1. </div></td>                                         <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;margin-bottom:9px;"><a rel="nofollow" target="_blank" href="http://us-mg999.mail.yahoo.com/neo/launch?action=contacts" style="text-decoration:underline;color:#61399d;"><span>Lägg till alla dina kontakter på en plats</span></a>.</div></td>                                     </tr>                                     <tr>                                         <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">2. </div></td>                                         <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;margin-bottom:9px;"><a rel="nofollow" target="_blank" href="http://mrd.mail.yahoo.com/themes" style="text-decoration:underline;color:#61399d;"><span>Anpassa din nya inkorg</span></a>.</div></td>                                     </tr>                                     <tr>                                         <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:14px;line-height:16px;color:#61399d;margin-bottom:9px;font-weight:bold;">3. </div></td>                                         <td align="left" valign="top"><div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#61399d;"><a rel="nofollow" target="_blank" href="http://se.overview.mail.yahoo.com/mobile" style="text-decoration:underline;color:#61399d;"><span>Anslut överallt på dina mobila enheter</span></a>.</div></td>                                     </tr>                                 </table>                              </div>                             </td>                         </tr>                         <tr><td height="13"></td></tr>                     </table>                     </td>                     <td width="196" valign="top">                     <table cellpadding="0" cellspacing="0">                         <tr>                             <td width="1" bgcolor="#fbfbfd" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g1.gif" width="1" height="21" style="display:block;"></td>                             <td width="1" bgcolor="#f5f6fa" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g2.gif" width="1" height="21" style="display:block;"></td>                             <td width="1" bgcolor="#e8eaf1" valign="top"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/g3.gif" width="1" height="21" style="display:block;"></td>                             <td width="1" bgcolor="#d4d4d4"></td>                             <td width="186" bgcolor="#f0f0f0" align="left" valign="top">                               <table cellspacing="0" cellpadding="0"><tr><td height="3">   </td></tr></table>                             <div style="margin-left:11px;">                             <div style="font-family:Arial, Helvetica, sans-serif;font-size:13px;line-height:16px;color:#333333;margin-bottom:9px;"><b>Info för dig:</b></div>                             <div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;color:#43494e;line-height:18px;margin-bottom:10px;">                             Yahoo!-ID och e-postadress:<br />                             <div style="font-family:Arial, Helvetica, sans-serif;font-size:12px;color:#43494e;line-height:18px;">                             Håll ditt konto och inställningar aktuella. <br><a rel="nofollow" target="_blank" href="https://edit.yahoo.com/config/eval_profile" style="text-decoration:underline;color:#61399d;"><span>Mitt konto</span></a>                              </div>                             </div>                             <table cellspacing="0" cellpadding="0"><tr><td height="20"></td></tr></table>                             </td>                             <td width="1" bgcolor="#dbdbdb"></td>                             <td width="1" bgcolor="#ced2de"></td>                             <td width="1" bgcolor="#dbdfed"></td>                             <td width="1" bgcolor="#e8ebf3"></td>                             <td width="1" bgcolor="#f3f4f9"></td>                             <td width="1" bgcolor="#fafbfc"></td>                         </tr>                         <tr>                             <td colspan="11"><img src="http://mail.yimg.com/nq/assets/sharedmessages/v1/all/b2.gif" width="196" height="8" style="display:block;" border="0"></td>                         </tr>                         <tr><td height="13"></td></tr>                     </table>                     </td>                     <td width="10"></td>                 </tr>             </table>             </td>         </tr>         <tr>             <td width="498" height="1" bgcolor="#c7c4ca"></td>         </tr>     </table>     <table cellpadding="0" cellspacing="0" width="500">         <tr>             <td align="center" valign="top">             <table cellspacing="0" cellpadding="0"><tr><td height="10"></td></tr></table>                 <div style="font-family:Arial, Helvetica, sans-serif;font-size:11px;line-height:18px;margin-bottom:10px;">                 <a rel="nofollow" target="_blank" href="http://info.yahoo.com/legal/se/yahoo/utos.html" style="text-decoration:underline;color:#61399d;">Yahoo! Villkor för användning</a>&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;<a rel="nofollow" target="_blank" href="http://info.yahoo.com/legal/se/yahoo/mail/atos.html" style="text-decoration:underline;color:#61399d;">Yahoo! Mail –Villkor för användning</a>&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;<a rel="nofollow" target="_blank" href="http://info.yahoo.com/privacy/se/yahoo/details.html" style="text-decoration:underline;color:#61399d;">Yahoo! Sekretesspolicy</a>                 </div>             </td>         </tr>         <tr>             <td align="left" valign="top">                 <div style="font-family:Arial, Helvetica, sans-serif;font-size:11px;line-height:14px;color:#545454;margin-left:16px;margin-right:14px;">Var god svara inte på detta meddelande. Detta är ett servicemeddelande som rör din användning av Yahoo! Mail. Om du vill veta mer om Yahoo!s användning av personlig information, inklusive användning av webb-beacons i HTML-baserad e-post, kan du läsa vår Yahoo! Sekretesspolicy. Yahoo!s adress är 701 First Avenue, Sunnyvale, CA 94089, USA.<br /><br />RefID: lp-1037111</div>             </td>         </tr>     </table>          </td> </tr> </table> <img width="1" height="1" src="http://pclick.internal.yahoo.com/p/s=2143684696"> </body> </html>` 

and the python code that try to extract the data is as follows.

>>> import sqlite3 >>> conn = sqlite3.connect('C:/temp/Mobils/export/com.yahoo.mobile.client.android.mail/databases/mail.db') >>> c = conn.cursor() >>> conn.row_factory=sqlite3.Row >>> c.execute('select body from messages_1 where _id=7') <sqlite3.Cursor object at 0x0000000001FB78F0> >>> r = c.fetchone() >>> r.keys() ['body'] >>> print(r['body']) Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "C:\Python32\lib\encodings\cp850.py", line 19, in encode     return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined> >>> 

Does anybody have any idea of how to print/write this to a file. Yes I know that this is printed to stdout, but I get the same UnicodeEncodeError when I try to write to a file. I tried both write method of a file object and print(r['body'], file=f).

like image 526
Mattias Avatar asked May 02 '13 20:05

Mattias


2 Answers

When you open the file you want to write to, open it with a specific encoding that can handle all the characters.

with open('filename', 'w', encoding='utf-8') as f:     print(r['body'], file=f) 
like image 102
Mark Ransom Avatar answered Sep 21 '22 14:09

Mark Ransom


Maybe a little late to reply. I happen to run into the same problem today. I find that on Windows you can change the console encoder to utf-8 or other encoder that can represent your data. Then you can print it to sys.stdout.

First, run following code in the console:

chcp 65001 set PYTHONIOENCODING=utf-8 

Then, start python do anything you want.

like image 23
Kattern Avatar answered Sep 19 '22 14:09

Kattern