Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected empty strings within Python strings

Observe the following interactive session:

In [1]: s = 'emptiness'

In [2]: s.replace('', '*')
Out[2]: '*e*m*p*t*i*n*e*s*s*'

In [3]: s.count('')
Out[3]: 10

I discovered this today, and it is a little confusing and surprising for me.

I love learning things like this about Python, but it seems like this could lead to some pretty confusing gotchas. For example, if the empty string was passed in as a variable, and just happened to be an empty string, you could end up with some surprising consequences. The behavior also seems a little inconsistent, because based on the interactive session above, I would think that the following would produce a list of all the characters in the string (similar to the JavaScript behavior). Instead, you get an error:

In [4]: s.split('')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-c31bd2432bc1> in <module>()
----> 1 s.split('')

ValueError: empty separator

Also, this leads to some seemingly contradictory behavior with str.endswith() and str.startswith():

In [5]: s.endswith('')
Out[5]: True

In [6]: s.endswith('s')
Out[6]: True

In [7]: s.startswith('')
Out[7]: True

In [8]: s.startswith('e')
Out[8]: True

Experimenting with various string methods, you can find more similarly strange examples.

My question is why does the empty string behave this way? Or is this the result of how the str methods are handling empty strings? If anyone has any insights, or can point me in the direction of an explanation/description of this behavior, that would be awesome.

like image 641
elethan Avatar asked Apr 12 '17 02:04

elethan


1 Answers

Python strings follow the principle that an empty string is a subset of every other string. Furthermore, python strings are also concatenations of byte strings, implying that a string consists of bytes sandwiched between empty strings. You can see that by the following examples:

>>>'a'.count('')
2
>>>'aa'.count('')
3
>>>'string'.count('')
7

So 'a' must be ''+'a'+'', and 'aa' must be ''+'a'+''+'a'+''.

When you check 'a'.startswith(''), it sees that the string 'a' technically starts with an empty string. Same for 'a'.endswith(''). However when you check 'a'.startswith('a'), it ignores the empty string and looks at the first byte.

like image 54
lordingtar Avatar answered Oct 27 '22 20:10

lordingtar