Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write Chinese characters to file by python

Tags:

python

I'm walking through a directory and want to write all files names into a file. Here's the piece of code

with open("c:/Users/me/filename.txt", "a") as d:
   for dir, subdirs, files in os.walk("c:/temp"):
      for f in files:
         fname = os.path.join(dir, f)
         print fname
         d.write(fname + "\n")
d.close()

The problem I have is, there are some files that are named in Chinese characters. By using print, I can see the file name correctly in console, but in the target file, it's just a mess... I've tried to open the file like open(u"c:/Users/me/filename.txt", "a"), but it did not work. I also tried to write fname.decode("utf-16"), still does not work...

like image 752
Bomin Avatar asked Feb 29 '16 15:02

Bomin


People also ask

Is Chinese in UTF-8?

UTF-8 also includes a variety of additional international characters, such as Chinese characters and Arabic characters.

Can CSV store Chinese characters?

xlsx format as Chinese characters are not compatible with CSV when exported directly. When converting CSV files to Microsoft Excel, Asian characters (e.g. CJK characters) tend to loose their encoding, hence they are not being displayed properly when opening in Excel.

How do you write Chinese characters fast?

Write words with other words that has the same pronunciation. For example, instead of writing the word 「飯」(faan6), they would write 「反」(faan2) because they both have similar pronunciations. Draw symbols instead of words. Instead of writing 「叉燒」, writing 「╳燒」takes less time and is still readable.


2 Answers

In Python 2, it's a good idea to use codecs.open() if you're dealing with encodings other than ASCII. That way, you don't need to manually encode everything you write. Also, os.walk() should be passed a Unicode string if you're expecting non-ASCII characters in the filenames:

import codecs
with codecs.open("c:/Users/me/filename.txt", "a", encoding="utf-8") as d:
   for dir, subdirs, files in os.walk(u"c:/temp"):
      for f in files:
         fname = os.path.join(dir, f)
         print fname
         d.write(fname + "\n")

No need to call d.close(), the with block already takes care of that.

like image 157
Tim Pietzcker Avatar answered Nov 02 '22 03:11

Tim Pietzcker


Use str.encode() to encode fname before you write it to the file:

d.write(fname.encode('utf8') + '\n')
like image 25
pp_ Avatar answered Nov 02 '22 02:11

pp_