How can I compare a unicode type to a string in python?

Tags:

I am trying to use a list comprehension that compares string objects, but one of the strings is utf-8, the byproduct of json.loads. Scenario:

us = u'MyString' # is the utf-8 string

Part one of my question, is why does this return False? :

us.encode('utf-8') == "MyString" ## False

Part two - how can I compare within a list comprehension?

myComp = [utfString for utfString in jsonLoadsObj            if utfString.encode('utf-8') == "MyString"] #wrapped to read on S.O.

EDIT: I'm using Google App Engine, which uses Python 2.7

Here's a more complete example of the problem:

#json coming from remote server: #response object looks like:  {"number1":"first", "number2":"second"}  data = json.loads(response) k = data.keys()  I need something like: myList = [item for item in k if item=="number1"]    #### I thought this would work: myList = [item for item in k if item.encode('utf-8')=="number1"]

242

asked May 09 '13 21:05

rGil

2 Answers

You must be looping over the wrong data set; just loop directly over the JSON-loaded dictionary, there is no need to call .keys() first:

data = json.loads(response) myList = [item for item in data if item == "number1"]

You may want to use u"number1" to avoid implicit conversions between Unicode and byte strings:

data = json.loads(response) myList = [item for item in data if item == u"number1"]

Both versions work fine:

>>> import json >>> data = json.loads('{"number1":"first", "number2":"second"}') >>> [item for item in data if item == "number1"] [u'number1'] >>> [item for item in data if item == u"number1"] [u'number1']

Note that in your first example, us is not a UTF-8 string; it is unicode data, the json library has already decoded it for you. A UTF-8 string on the other hand, is a sequence encoded bytes. You may want to read up on Unicode and Python to understand the difference:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder

On Python 2, your expectation that your test returns True would be correct, you are doing something else wrong:

>>> us = u'MyString' >>> us u'MyString' >>> type(us) <type 'unicode'> >>> us.encode('utf8') == 'MyString' True >>> type(us.encode('utf8')) <type 'str'>

There is no need to encode the strings to UTF-8 to make comparisons; use unicode literals instead:

myComp = [elem for elem in json_data if elem == u"MyString"]

105

answered Oct 06 '22 07:10

Martijn Pieters

You are trying to compare a string of bytes ('MyString') with a string of Unicode code points (u'MyString'). This is an "apples and oranges" comparison. Unfortunately, Python 2 pretends in some cases that this comparison is valid, instead of always returning False:

>>> u'MyString' == 'MyString'  # in my opinion should be False True

It's up to you as the designer/developer to decide what the correct comparison should be. Here is one possible way:

a = u'MyString' b = 'MyString' a.encode('UTF-8') == b  # True

I recommend the above instead of a == b.decode('UTF-8') because all u'' style strings can be encoded into bytes with UTF-8, except possibly in some bizarre cases, but not all byte-strings can be decoded to Unicode that way.

But if you choose to do a UTF-8 encode of the Unicode strings before comparing, that will fail for something like this on a Windows system: u'Em dashes\u2014are cool'.encode('UTF-8') == 'Em dashes\x97are cool'. But if you .encode('Windows-1252') instead it would succeed. That's why it's an apples and oranges comparison.

answered Oct 06 '22 08:10

wberry

Related questions
                            
                                Aptana Error-pydev: Port not bound (found port -1)?
                            
                                Is it possible to prefill a input() in Python 3's Command Line Interface?
                            
                                How to run recurring task in the Python Flask framework?
                            
                                ReactorNotRestartable error in while loop with scrapy
                            
                                Converting PDF to images automatically
                            
                                Python star unpacking for version 2.7
                            
                                Alternative to contextlib.nested with variable number of context managers
                            
                                Why does node.js need python
                            
                                How to specify the version of Python for spark-submit to use?
                            
                                Jupyter Lab - launches but don't see any tabs (look/feel is 100% different than seen in youtube videos)
                            
                                Resources concerning Python scripting in Vim
                            
                                Large scale machine learning - Python or Java? [closed]
                            
                                conda install python=3.6 UnsatisfiableError
                            
                                Can I test AWS Glue code locally?
                            
                                Solving "500: Internal Server Error, nbconvert failed: xelatex not found in PATH"
                            
                                Why don't Python sets preserve insertion order?
                            
                                Emacs and Python
                            
                                What in the world is the attribute "__class__" in python
                            
                                What is the hard recursion limit for Linux, Mac and Windows?
                            
                                What does this "-" in jinja2 template engine do?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I compare a unicode type to a string in python?

Tags:

python

list-comprehension

unicode

python-2.7

rGil

People also ask

2 Answers

Martijn Pieters

wberry

Recent Activity

Donate For Us