What is internal representation of string in Python 3.x

Tags:

In Python 3.x, a string consists of items of Unicode ordinal. (See the quotation from the language reference below.) What is the internal representation of Unicode string? Is it UTF-16?

The items of a string object are Unicode code units. A Unicode code unit is represented by a string object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and will be reported as two separate items.

471

asked Dec 03 '09 06:12

thebat

2 Answers

The internal representation will change in Python 3.3 which implements PEP 393. The new representation will pick one or several of ascii, latin-1, utf-8, utf-16, utf-32, generally trying to get a compact representation.

Implicit conversions into surrogate pairs will only be done when talking to legacy APIs (those only exist on windows, where wchar_t is two bytes); the Python string will be preserved. Here are the release notes.

answered Sep 23 '22 19:09

Tobu

In Python 3.3 and above, the internal representation of the string will depend on the string, and can be any of latin-1, UCS-2 or UCS-4, as described in PEP 393.

For previous Pythons, the internal representation depends on the build flags of Python. Python can be built with flag values --enable-unicode=ucs2 or --enable-unicode=ucs4. ucs2 builds do in fact use UTF-16 as their internal representation, and ucs4 builds use UCS-4 / UTF-32.

answered Sep 21 '22 19:09

Matthew Brett

Related questions
                            
                                Python lambda closure scoping
                            
                                Unbalanced classification using RandomForestClassifier in sklearn
                            
                                How to Create a form from a json-schema? [closed]
                            
                                Parse XML from URL into python object
                            
                                Modify a particular row/column of a NumPy array
                            
                                Histogram in matplotlib, time on x-Axis
                            
                                What is the most pythonic way to iterate over OrderedDict
                            
                                Difference between hash() and id()
                            
                                How to rotate X-axis labels in bokeh figure?
                            
                                "pip is configured with locations that require TLS/SSL, however the ssl module in Python is not available"
                            
                                Is it possible to pass arguments into event bindings?
                            
                                How do I access part of a list in Jinja2
                            
                                python yaml.dump bad indentation
                            
                                Why don't I have xlrd?
                            
                                Scipy/Numpy FFT Frequency Analysis
                            
                                Capturing repeating subpatterns in Python regex
                            
                                How to create a commit and push into repo with GitHub API v3?
                            
                                Getting all field names from a protocol buffer?
                            
                                Repeating each element of a numpy array 5 times
                            
                                ValueError: Layer sequential_20 expects 1 inputs, but it received 2 input tensors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is internal representation of string in Python 3.x

Tags:

python

string

python-3.x

unicode

thebat

People also ask

2 Answers

Tobu

Matthew Brett

Recent Activity

Donate For Us