Observe the following interactive session:
In [1]: s = 'emptiness'
In [2]: s.replace('', '*')
Out[2]: '*e*m*p*t*i*n*e*s*s*'
In [3]: s.count('')
Out[3]: 10
I discovered this today, and it is a little confusing and surprising for me.
I love learning things like this about Python, but it seems like this could lead to some pretty confusing gotchas. For example, if the empty string was passed in as a variable, and just happened to be an empty string, you could end up with some surprising consequences. The behavior also seems a little inconsistent, because based on the interactive session above, I would think that the following would produce a list of all the characters in the string (similar to the JavaScript behavior). Instead, you get an error:
In [4]: s.split('')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-c31bd2432bc1> in <module>()
----> 1 s.split('')
ValueError: empty separator
Also, this leads to some seemingly contradictory behavior with str.endswith()
and str.startswith()
:
In [5]: s.endswith('')
Out[5]: True
In [6]: s.endswith('s')
Out[6]: True
In [7]: s.startswith('')
Out[7]: True
In [8]: s.startswith('e')
Out[8]: True
Experimenting with various string methods, you can find more similarly strange examples.
My question is why does the empty string behave this way? Or is this the result of how the str
methods are handling empty strings? If anyone has any insights, or can point me in the direction of an explanation/description of this behavior, that would be awesome.
Python strings follow the principle that an empty string is a subset of every other string. Furthermore, python strings are also concatenations of byte strings, implying that a string consists of bytes sandwiched between empty strings. You can see that by the following examples:
>>>'a'.count('')
2
>>>'aa'.count('')
3
>>>'string'.count('')
7
So 'a'
must be ''+'a'+''
, and 'aa'
must be ''+'a'+''+'a'+''
.
When you check 'a'.startswith('')
, it sees that the string 'a' technically starts with an empty string. Same for 'a'.endswith('')
. However when you check 'a'.startswith('a')
, it ignores the empty string and looks at the first byte.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With