Ansi to UTF-8 using python causing error

Question

While I was trying to write a python program that converts Ansi to UTF-8, I found this

https://stackoverflow.com/questions/14732996/how-can-i-convert-utf-8-to-ansi-in-python

which converts UTF-8 to Ansi.

I thought it will just work by reversing the order. So I coded

file_path_ansi = "input.txt"
file_path_utf8 = "output.txt"

#open and encode the original content
file_source = open(file_path_ansi, mode='r', encoding='latin-1', errors='ignore')
file_content = file_source.read()
file_source.close

#write 
file_target = open(file_path_utf8, mode='w', encoding='utf-8')
file_target.write(file_content)
file_target.close

But it causes error.

TypeError: file<> takes at most 3 arguments <4 given>

So I changed

file_source = open(file_path_ansi, mode='r', encoding='latin-1', errors='ignore')

to

file_source = open(file_path_ansi, mode='r', encoding='latin-1')

Then it causes another error:

TypeError: 'encoding' is an invalid keyword arguemtn for this function

How should I fix my code to solve this problem?

Martijn Pieters · Accepted Answer

You are trying to use the Python 3 version of the open() function, on Python 2. Between the major versions, I/O support was overhauled, supporting better encoding and decoding.

You can get the same new version in Python 2 as io.open() instead.

I'd use the shutil.copyfileobj() function to do the copying, so you don't have to read the whole file into memory:

import io
import shutil

with io.open(file_path_ansi, encoding='latin-1', errors='ignore') as source:
    with io.open(file_path_utf8, mode='w', encoding='utf-8') as target:
        shutil.copyfileobj(source, target)

Be careful though; most people talking about ANSI refer to one of the Windows codepages; you may really have a file in CP (codepage) 1252, which is almost, but not quite the same thing as ISO-8859-1 (Latin 1). If so, use cp1252 instead of latin-1 as the encoding parameter.

Ansi to UTF-8 using python causing error

Tags:

python

character-encoding

utf-8

user3123767

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

Ansi to UTF-8 using python causing error

Tags:

python

character-encoding

utf-8

user3123767

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us