Let's assume that I need to write and then read a list of strings with polish words in a .csv in Python 3.6:
lista=['szczęśliwy','jabłko','słoń','kot']
Since it's not possible to write Unicode characters in the .csv, I encode the strings to utf-8, so data is saved like this in the file (all inside the first .csv cell):
b'szcz\xc4\x99\xc5\x9bliwy',b'jab\xc5\x82ko',b's\xc5\x82o\xc5\x84',b'kot'
But I am not able to decode the data from the output.csv file using this code:
with open('output.csv') as csvarchive:
entrada = csv.reader(csvarchive)
for reg in entrada:
lista2=reg
print(lista2)
["b'szcz\\xc4\\x99\\xc5\\x9bliwy'", "b'jab\\xc5\\x82ko'", "b's\\xc5\\x82o\\xc5\\x84'", "b'kot'"]
lista2
is still a list of strings but with the utf-8 codification and I am not able to recover the special characters.
I tried several things like reading the file in 'rb'
mode, encoding and decoding again... But since I am new in these matters I didn't make it. It must have very easy solution.
newline=''
(this applies to the Python csv
module)So, assuming your CSV file is UTF-8-encoded, use:
with open('output.csv', 'r', encoding='UTF-8', newline='') as csvarchive:
entrada = csv.reader(csvarchive)
for reg in entrada:
# do something with the data row, it's already decoded
The same applies to writing the file:
with open('output.csv', 'w', encoding='UTF-8', newline='') as csvarchive:
writer = csv.writer(csvarchive)
# write data to the writer, it will be encoded automatically
There is no need to do any manual string encoding. Write string values to the csv
writer, file encoding will happen transparently.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With