I am faced with a strange behavior between ArcPy and Python encoding. I work with VisualStudio 2010 Shell with Python tools for VS (PTVS) installed. I isolated my problem through a simple script file. The py script file that contains the following commands. In VisualStudio, I have set the « Advanced Save Options...» to « UTF-8 without signature ». The script simply print on the screen a accented string, then import arcpy module, then again print the same string. Importing Arcpy seems to change the Python encoding setup but I don't know why and I would like to restablish it correctly because it causes problems a little bit everywhere in the original script.
I checked the python « encoding » folder and erased every pyc file. Than I ran the script and it generated 3 pyc files :
When ArcPy is being imported, something comes altering the encoding that affects the initial variables.
Why?
Is it possible with some Python command to find where the ArcPy encode cp1252 is located and read it so that I can make a function that deals with it?
# -*- coding: utf-8 -*-
import sys
print ('Loaded encoding : %(t)s'%{'t':sys.getdefaultencoding()})
reload(sys) # See stackoverflow question 2276200
sys.setdefaultencoding('utf-8')
print ('Set default encoding : %(t)s'%{'t':sys.getdefaultencoding()})
print ''
texte = u'Récuperation des données'
print ('Original type : %(t)s'%{'t':type(texte)})
print ('Original text : %(t)s'%{'t':texte})
print ''
import arcpy
print ('imported arcpy')
print ('Loaded encoding : %(t)s'%{'t':sys.getdefaultencoding()})
print ''
print ('arcpy mess up original type : %(t)s'%{'t':type(texte)})
print ('arcpy mess up original text : %(t)s'%{'t':texte})
print ''
print ('arcpy mess up reencoded with cp1252 type : %(t)s'%{'t':type(texte.encode('cp1252'))})
print ('arcpy mess up reencoded with cp1252 text : %(t)s'%{'t':texte.encode('cp1252')})
raw_input()
and when I run the script, I get these results :
Loaded encoding : ascii
Set encoding : utf-8
Original type : type 'unicode'
Original text : Récuperation des données <--- This is right
import arcpy
Loaded encoding : utf-8
arcpy mess up original type : type 'unicode'
arcpy mess up original text : R'cuperation des donn'es> <--- This is wrong
arcpy mess up ReEncode with cp1252 type : type 'str'
arcpy mess up ReEncode with cp1252 text : Récuperation des données> <--- This is fits with the original unicode
Answering my question.
From ESRI support, I got this information :
By default, python in the command line is not going to change the code page to a UTF-8 based text for print statements to show up in Unicode. ArcGIS on the other hand specifically allows unicode values to be passed to it and has changed the code page within the command line so that the values you see printed are the values ArcGIS is using. This is why the command line should be the only environment where you see the import sys followed by import arcpy give you a different printed value.
Since my application run scripts that does not always need arcpy, depending of what I want it to do, to solve my problem, I made a generic function that deals with the encoding, whether or not arcpy has been imported, using the information provided by :
Coding_CMD_Window = sys.stdout.encoding
Coding_OS = locale.getpreferredencoding()
Coding_Script = sys.getdefaultencoding()
Coding2Use = Coding_CMD_Window
if any('arcpy' in importedmodules for importedmodules in sys.modules):
Coding2Use = Coding_OS
Also, I made sure that all of my scripts had the proper UTF-8 encoding without signature.
Hope this helps anyone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With