Encoding special characters for passing to a URL

Tags:

I am trying to learn Python, so I thought I would start by trying to query IMDB to check my movie collection against IMDB; which was going well 😊

What I am stuck on is how to handle special characters in names, and encode the name to something a URL will respect.

For example I have the movie Brüno

If I encode the string using urllib.parse.quote I get - Bru%CC%88no which means when I query IMDB using OMDBAPI it fails to find the movie. If I do the search via the OMDBAPI site, they encode the name as Br%C3%BCno and this search works.

I am assuming that the encode is using a different standard, but I can’t work out what I need to do

511

asked Mar 22 '19 14:03

PhilC

1 Answers

It is using the same encoding, but using different normalizations.

>>> import unicodedata
>>> "Brüno".encode("utf-8")
b'Bru\xcc\x88no'
>>> unicodedata.normalize("NFC", "Brüno").encode("utf-8")
b'Br\xc3\xbcno'

Some graphemes (things you see as one "character"), especially those with diacritics can be made from different characters. An "ü" can either be a "u", with a combining diaresis, or the character "ü" itself (the combined form). Combined forms don't exist for every combination of letter and diacritic, but they do for commonly used ones (= those existing in common languages).

Unicode normalization transforms all characters that form graphemes into either combined or seperate characters. The normalization method "NFC", or Normalization Form Canonical Composition, combines characters as far as possible.

In comparison, the other main form, Normalization Form Canonical Decomposition, or "NFD" will produce your version:

>>> unicodedata.normalize("NFD", "Brüno").encode("utf-8")
b'Bru\xcc\x88no'

186

answered Oct 17 '22 20:10

L3viathan

Related questions
                            
                                Django rest framework logging different levels on different files
                            
                                Using Pandas value.counts() to get one value
                            
                                Pandas DataFrame.from_dict() poor performance when generating from a lengthy dict of dicts
                            
                                Python: What if we call thread.start() and leave it without join or close?
                            
                                How to reliably detect a barcode's 4 corners?
                            
                                Killing sudo-started subprocess in python
                            
                                Pyautogui mouse clicks without actually moving the mouse
                            
                                Sync code to async, without rewriting the function
                            
                                Purpose of django.db.models.fields.Field.name argument
                            
                                Loss goes up back to starting value after re-initializing dataset
                            
                                Using a fake mongoDB for pytest testing
                            
                                redis locking: redispy vs python-redis-lock
                            
                                What is the request header by default in python requests
                            
                                Make arrow head shape symmetric regardless of the angle of the arrow in matplotlib
                            
                                Pandas dataframe raises KeyError when sort_values() method is called
                            
                                Cannot import category_encoders module
                            
                                No output, even with `py.test -s`
                            
                                Calculating percentage of number with Tensorflow
                            
                                Negation and dependency parsing with spaCy
                            
                                Wrapping homogeneous Python objects

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Encoding special characters for passing to a URL

Tags:

python

python-3.x

urlencode

omdbapi

PhilC

People also ask

1 Answers

L3viathan

Recent Activity

Donate For Us