Time complexity of python string index access?

Question

If I'm not mistaken, a Python string is stored in unicode scalars. However, unicode scalars can combine to form other grapheme clusters. Therefore, using memory displacement start + scalarSize * n for string[n] isn't the answer you're looking for.

Does this mean that Python iterates linearly through each scalar to get to the scalar you are looking for? If you have

word = 'caf' + char(65) + char(301) #café

Does Python store this as five scalars and iteratively check if any should be combined before moving on or does it run a check upon insertion and store 'pure' scalars?

Edit: I was confusing Python with another language. Python's print() prints out grapheme clusters but Python's str stores scalars no matter how you input them. So two combined scalars will print as one grapheme cluster which could be the same cluster as another scalar. When you go to call string[0] you'd get the scalar as inserted into the string.

user2357112 supports Monica · Accepted Answer

Python string indexing does not consider grapheme clusters. It works by Unicode code points. I don't think Python actually has anything built-in for working with grapheme clusters.

String indexing takes constant time, but if you want to retrieve the nth grapheme cluster, string indexing won't do that for you.

(People sometimes suggest applying canonical composition to the string, but there are plenty of possible grapheme clusters that still take multiple code points after canonical composition.)

Time complexity of python string index access?

Tags:

python

string

python-3.x

lanza

1 Answers

user2357112 supports Monica

Recent Activity

Donate For Us

Time complexity of python string index access?

Tags:

python

string

python-3.x

lanza

1 Answers

user2357112 supports Monica

Related questions

Recent Activity

Donate For Us