python opens text file with a space between every character

Tags:

Whenever I try to open a .csv file with the python command fread = open('input.csv', 'r') it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?

Thanks.

Update

Ok, I got it with the help of Jarret Hardie's post

this is the code that I used to convert the file to ascii

fread = open('input.csv', 'rb').read() mytext = fread.decode('utf-16') mytext = mytext.encode('ascii', 'ignore') fwrite = open('input-ascii.csv', 'wb') fwrite.write(mytext)

Thanks!

497

asked Mar 02 '09 17:03

wlindner

2 Answers

The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.

Try something like:

fread = open('input.csv', 'rb').read() mytext = fread.decode('utf-16')

The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.

EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'I.D.|.' (etc). The dot is the extra byte for each char.

The code snippet above seems to work on my machine with that file.

answered Oct 21 '22 06:10

Jarret Hardie

The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.

answered Oct 21 '22 05:10

recursive

Related questions
                            
                                Measuring text in WPF
                            
                                Best practices for encrypting and decrypting passwords? (C#/.NET)
                            
                                Java NTP client
                            
                                Why does Request.IsSecureConnection return false when true is expected
                            
                                How can I make vim recognize the file's encoding?
                            
                                LINQ to SQL - mapping exception when using abstract base classes
                            
                                Simultaneous Java and Scala development within the same project
                            
                                Detect if a page is within a iframe - serverside
                            
                                Where to tweak an Eclipse to change the default settings used when creating a new Workspace?
                            
                                How to refer to the file currently being loaded in Emacs Lisp?
                            
                                How to make a repeating generator in Python
                            
                                iTextSharp units

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With