Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Writing/Reading special characters from CSV (Python 3.6)

Let's assume that I need to write and then read a list of strings with polish words in a .csv in Python 3.6:

lista=['szczęśliwy','jabłko','słoń','kot']

Since it's not possible to write Unicode characters in the .csv, I encode the strings to utf-8, so data is saved like this in the file (all inside the first .csv cell):

b'szcz\xc4\x99\xc5\x9bliwy',b'jab\xc5\x82ko',b's\xc5\x82o\xc5\x84',b'kot'

But I am not able to decode the data from the output.csv file using this code:

with open('output.csv') as csvarchive:
    entrada = csv.reader(csvarchive)
    for reg in entrada:
        lista2=reg

print(lista2)
["b'szcz\\xc4\\x99\\xc5\\x9bliwy'", "b'jab\\xc5\\x82ko'", "b's\\xc5\\x82o\\xc5\\x84'", "b'kot'"]

lista2 is still a list of strings but with the utf-8 codification and I am not able to recover the special characters.

I tried several things like reading the file in 'rb' mode, encoding and decoding again... But since I am new in these matters I didn't make it. It must have very easy solution.

like image 480
Pacullamen Avatar asked Nov 02 '17 16:11

Pacullamen


1 Answers

  1. Never open text files without specifying an encoding (this is generally true).
  2. Always open CSV files with newline='' (this applies to the Python csv module)

So, assuming your CSV file is UTF-8-encoded, use:

with open('output.csv', 'r', encoding='UTF-8', newline='') as csvarchive:
    entrada = csv.reader(csvarchive)
    for reg in entrada:
        # do something with the data row, it's already decoded

The same applies to writing the file:

with open('output.csv', 'w', encoding='UTF-8', newline='') as csvarchive:
    writer = csv.writer(csvarchive)
    # write data to the writer, it will be encoded automatically

There is no need to do any manual string encoding. Write string values to the csv writer, file encoding will happen transparently.

like image 119
Tomalak Avatar answered Sep 28 '22 02:09

Tomalak