I have this code:
printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n' # Write file f.write (printinfo + '\n')
But I get this error when running it:
f.write(printinfo + '\n') UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)
It's having toruble writing out this:
Identité secrète (Abduction) [VF]
Any ideas please, not sure how to fix.
Cheers.
UPDATE: This is the bulk of my code, so you can see what I am doing:
def runLookupEdit(self, event): newpath1 = pathindir + "/" errorFileOut = newpath1 + "REPORT.csv" f = open(errorFileOut, 'w') global old_vendor_id for old_vendor_id in vendorIdsIn.splitlines(): writeErrorFile = 0 from lxml import etree parser = etree.XMLParser(remove_blank_text=True) # makes pretty print work path1 = os.path.join(pathindir, old_vendor_id) path2 = path1 + ".itmsp" path3 = os.path.join(path2, 'metadata.xml') # Open and parse the xml file cantFindError = 0 try: with open(path3): pass except IOError: cantFindError = 1 errorMessage = old_vendor_id self.Error(errorMessage) break tree = etree.parse(path3, parser) root = tree.getroot() for element in tree.xpath('//video/title'): title = element.text while '\n' in title: title= title.replace('\n', ' ') while '\t' in title: title = title.replace('\t', ' ') while ' ' in title: title = title.replace(' ', ' ') title = title.strip() element.text = title print title ######################################### ######## REMOVE UNWANTED TAGS ######## ######################################### # Remove the comment tags comments = tree.xpath('//comment()') q = 1 for c in comments: p = c.getparent() if q == 3: apple_id = c.text p.remove(c) q = q+1 apple_id = apple_id.split(':',1)[1] apple_id = apple_id.strip() printinfo = title + "\t" + old_vendor_id + "\t" + apple_id # Write file # f.write (printinfo + '\n') f.write(printinfo.encode('utf8') + '\n') f.close()
Only a limited number of Unicode characters are mapped to strings. Thus, any character that is not-represented / mapped will cause the encoding to fail and raise UnicodeEncodeError. To avoid this error use the encode( utf-8 ) and decode( utf-8 ) functions accordingly in your code.
The UnicodeEncodeError normally happens when encoding a unicode string into a certain coding. Since codings map only a limited number of unicode characters to str strings, a non-presented character will cause the coding-specific encode() to fail. Encoding from unicode to str. >>>
You need to encode Unicode explicitly before writing to a file, otherwise Python does it for you with the default ASCII codec.
Pick an encoding and stick with it:
f.write(printinfo.encode('utf8') + '\n')
or use io.open()
to create a file object that'll encode for you as you write to the file:
import io f = io.open(filename, 'w', encoding='utf8')
You may want to read:
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
before continuing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With