Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Unicode source file adds spaces (actually null bytes) between characters

I am a newbie. However, I managed to extract some lines from a txt-file (unicode) and write them in another file.

lines = InFile.readlines()
OutFile.writelines(lines[3:])

It is working but (I believe) due to a coding issue there is a space added between each character in the output file. Example of a result:

2 0 1 3 - 1 2 - 2 3 ; ; 3 6 0 . 3 7 
2 0 1 3 - 1 2 - 2 4 ; ; 0 . 0 0 

Lines in the source file:

2013-12-23;;360.37
2013-12-24;;0.00

If I save the txt source file as ANSI before running the script, I receive the correct results. However, as the source file is delivered automatically as Unicode by another software, it is not practical to change that every time manually. I read through a lot of other coding/encoding/decoding questions. But I am completely lost and don't know how I can fix that issue. Which is the correct command? At which place in the script? Or am I completely wrong and it doesn't have anything to do with a coding issue?

like image 813
user3037270 Avatar asked Nov 27 '13 18:11

user3037270


1 Answers

I'm fairly certain that your input file is UTF-16 encoded, and the spaces you're seeing are actually null bytes.

Try

with open("myfile.txt", "r", encoding="utf-16") as infile:
    lines = infile.readlines()

and see if the problem persists.

like image 79
Tim Pietzcker Avatar answered Sep 30 '22 13:09

Tim Pietzcker