Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Wrong encoding in filenames created on Windows XP by Python script

My Python script creates a xml file under Windows XP but that file doesn't get the right encoding with Spanish characters such 'ñ' or some accented letters.

First of all, the filename is read from an excel shell with the following code, I use to read the Excel file xlrd libraries:

filename = excelsheet.cell_value(rowx=first_row, colx=5)

Then, I've tried some encodings without success to generate the file with the right encode:

filename = filename[:-1].encode("utf-8")
filename = filename[:-1].encode("latin1")
filename = filename[:-1].encode("windows-1252")

Using "windows-1252" I get a bad encoding with letter 'ñ', 'í' and 'é'. For example, I got BAJO ARAGÓN_Alcañiz.xml instead of BAJO ARAGÓN_Alcañiz.xml

Thanks in advance for your help

like image 699
user1632979 Avatar asked Jun 14 '26 13:06

user1632979


1 Answers

You should use unicode strings for your filenames. In general operating systems support filenames that contain arbitrary Unicode characters. So if you do:

fn = u'ma\u00d1o'  # maÑo
f = open(fn, "w")
f.close()
f = open(fn, "r")
f.close()

it should work just fine. A different thing is what you see in your terminal when you list the content of the directory where that file lives. If the encoding of the terminal is UTF-8 you will see the filename maño, but if the encoding is for instance iso-8859-1 you will see maÃo. But even if you see these strange characters you should be able to open the file from python as described above.

In summary, do not encode the output of

filename = excelsheet.cell_value(rowx=first_row, colx=5)

instead make sure it is a unicode string.

Reading the Unicode filenames section of the Python Unicode HOWTO can be helpful for you.

like image 152
Vicent Avatar answered Jun 16 '26 04:06

Vicent



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!