Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any function like iconv in Python?

I have some CSV files need to convert from shift-jis to utf-8.

Here is my code in PHP, which is successful transcode to readable text.

$str = utf8_decode($str);
$str = iconv('shift-jis', 'utf-8'. '//TRANSLIT', $str);
echo $str;

My problem is how to do same thing in Python.

like image 218
hugowan Avatar asked Jun 05 '15 08:06

hugowan


People also ask

How do I encode an UTF 8 file in python?

universaldetector import UniversalDetector targetFormat = 'utf-8' outputDir = 'converted' detector = UniversalDetector() def get_encoding_type(current_file): detector. reset() for line in file(current_file): detector. feed(line) if detector. done: break detector.

How do I change the encoding of a string in Python?

Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.

What is Iconv R?

The iconv() function in R is used to convert given character vectors between encodings.


1 Answers

I don't know PHP, but does this work :

mystring.decode('shift-jis').encode('utf-8') ?

Also I assume the CSV content is from a file. There are a few options for opening a file in python.

with open(myfile, 'rb') as fin

would be the first and you would get data as it is

with open(myfile, 'r') as fin

would be the default file opening

Also I tried on my computed with a shift-js text and the following code worked :

with open("shift.txt" , "rb") as  fin :
    text = fin.read()

text.decode('shift-jis').encode('utf-8')

result was the following in UTF-8 (without any errors)

' \xe3\x81\xa6 \xe3\x81\xa7 \xe3\x81\xa8'

Ok I validate my solution :)

The first char is indeed the good character: "\xe3\x81\xa6" means "E3 81 A6" It gives the correct result.

enter image description here

You can try yourself at this URL

like image 57
Azurtree Avatar answered Sep 21 '22 02:09

Azurtree