Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 3 doesn't read unicode file on a new server

My webpages are served by a script that dynamically imports a bunch of files with

try:
    with open (filename, 'r') as f:
        exec(f.read())
except IOError: pass

(actually, can you suggest a better method of importing a file? I'm sure there is one.)

Sometimes the files have strings in different languages, like

# contents of language.ru
title = "Название"

Those were all saved as UTF-8 files. Python has no problem running the script in command line or serving a page from my MacBook:

    OK: [server command line] python3.0 page.py /index.ru
    OK: http://whitebox.local/index.ru

but it throws an error when trying to serve a page from a server we just moved to:

      157     try:
      158         with open (filename, 'r') as f:
      159             exec(f.read())
      160     except IOError: pass
      161 
      /usr/local/lib/python3.0/io.py in read(self=, n=-1)
      ...
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 627: ordinal not in range(128) 

All the files were copied from my laptop where they were perfectly served by Apache. What is the reason?

Update: I found out the default encoding for open() is platform-dependent so it was utf8 on my laptop and ascii on server. I wonder if there is a per-program function to set it in Python 3 (sys.setdefaultencoding is used in site module and then deleted from the namespace).

like image 202
ilya n. Avatar asked Jun 11 '09 21:06

ilya n.


People also ask

Does Python3 use unicode?

Since Python 3.0, the language's str type contains Unicode characters, meaning any string created using "unicode rocks!" , 'unicode rocks!' , or the triple-quoted string syntax is stored as Unicode.

Does Python 3 use Ascii or unicode?

In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.


3 Answers

Use open(filename, 'r', encoding='utf8'). See Python 3 docs for open.

like image 104
Alex Martelli Avatar answered Oct 24 '22 08:10

Alex Martelli


Use codecs library, I'm using python 2.6.6 and I do not use the usual open with encoding argument:

import codecs
codecs.open('filename','r',encoding='UTF-8')
like image 42
vieyra Avatar answered Oct 24 '22 09:10

vieyra


You can use something like

with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
    data = f.read()

# make changes to the string 'data'

with open(fname + '.new', 'w',
           encoding="ascii", errors="surrogateescape") as f:
    f.write(data)

more information is on python unicode documents

like image 1
eSadr Avatar answered Oct 24 '22 09:10

eSadr