Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does an empty string in Python sometimes take up 49 bytes and sometimes 51?

Tags:

python

I tested sys.getsize('') and sys.getsize(' ') in three environments, and in two of them sys.getsize('') gives me 51 bytes (one byte more than the second) instead of 49 bytes:

Screenshots:

Win8 + Spyder + CPython 3.6:

sys.getsizeof('') == 49 and sys.getsizeof(' ') == 50

Win8 + Spyder + IPython 3.6:

sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50

Win10 (VPN remote) + PyCharm + CPython 3.7:

sys.getsizeof('') == 51 and sys.getsizeof(' ') == 50

First edit

I did a second test in Python.exe instead of Spyder and PyCharm (These two are still showing 51), and everything seems to be good. Apparently I don't have the expertise to solve this problem so I'll leave it to you guys :)

Win10 + Python 3.7 console versus PyCharm using same interpreter:

enter image description here

Win8 + IPython 3.6 + Spyder using same interpreter:

enter image description here

like image 693
Nicholas Humphrey Avatar asked Dec 22 '18 23:12

Nicholas Humphrey


People also ask

How many bytes does an empty string take?

Borrowed from this answer: the program prints 32 bytes for the empty string (and 0 for "" which is in the string pool).

What is the length of an empty string in Python?

The length of the empty string is 0. The len() function in Python is omnipresent - it's used to retrieve the length of every data type, with string just a first example.

How do blank strings work in Python?

Empty strings are "falsy" (python 2 or python 3 reference), which means they are considered false in a Boolean context, so you can just do this: if not myString: This is the preferred way if you know that your variable is a string.

Is empty string true in Python?

Empty strings are "falsy" which means they are considered false in a Boolean context, so you can just use not string.


1 Answers

This sounds like something is retrieving the wchar representation of the string object. As of CPython 3.7, the way the CPython Unicode representation works out, an empty string is normally stored in "compact ASCII" representation, and the base data and padding for a compact ASCII string on a 64-bit build works out to 48 bytes, plus one byte of string data (just the null terminator). You can see the relevant header file here.

For now (this is scheduled for removal in 4.0), there is also an option to retrieve a wchar_t representation of a string. On a platform with 2-byte wchar_t, the wchar representation of an empty string is 2 bytes (just the null terminator again). The wchar representation is cached on the string on first access, and str.__sizeof__ accounts for this extra data when it exists, resulting in a 51-byte total.

like image 131
user2357112 supports Monica Avatar answered Sep 28 '22 04:09

user2357112 supports Monica