I have some CSV files need to convert from shift-jis to utf-8.
Here is my code in PHP, which is successful transcode to readable text.
$str = utf8_decode($str);
$str = iconv('shift-jis', 'utf-8'. '//TRANSLIT', $str);
echo $str;
My problem is how to do same thing in Python.
universaldetector import UniversalDetector targetFormat = 'utf-8' outputDir = 'converted' detector = UniversalDetector() def get_encoding_type(current_file): detector. reset() for line in file(current_file): detector. feed(line) if detector. done: break detector.
Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.
The iconv() function in R is used to convert given character vectors between encodings.
I don't know PHP, but does this work :
mystring.decode('shift-jis').encode('utf-8') ?
Also I assume the CSV content is from a file. There are a few options for opening a file in python.
with open(myfile, 'rb') as fin
would be the first and you would get data as it is
with open(myfile, 'r') as fin
would be the default file opening
Also I tried on my computed with a shift-js text and the following code worked :
with open("shift.txt" , "rb") as fin :
text = fin.read()
text.decode('shift-jis').encode('utf-8')
result was the following in UTF-8 (without any errors)
' \xe3\x81\xa6 \xe3\x81\xa7 \xe3\x81\xa8'
Ok I validate my solution :)
The first char is indeed the good character: "\xe3\x81\xa6" means "E3 81 A6" It gives the correct result.
You can try yourself at this URL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With