Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Arabic, Unicode and files in python

I am trying to grab some text written in Arabic from Youtube, writting it into a file and reading it again.

The source file to grab the text has:

#!/usr/bin/python
#encoding: utf-8

in the beginning of the file.

Writing the text are done like this:

f.write(comment + '\n' )

The file contents is readable Arabic, so I assume the previous steps were correct.

But the problem appears when trying to read the contents from the file (and writing them for example into another file) like this:

in = open('data_Pass1/EG', 'rb')
out.write(in.read())

Which results in output file like this:

\xd8\xa7\xd9\x8a\xd9\x87

What is causing this?

like image 473
Betamoo Avatar asked Jun 13 '13 17:06

Betamoo


1 Answers

In python 3.x


in = open('data_Pass1/EG', 'r', encoding='utf-8')
out = open('_file_name_', 'w', encoding='utf-8')

In python 2.x.


import codecs
in = codecs.open('data_Pass1/EG', 'r', encoding='utf-8')
out = codecs.open('_file_name_', 'w', encoding='utf-8')

like image 175
shantanoo Avatar answered Sep 29 '22 06:09

shantanoo