>>> import sys
>>> sys.getsizeof("")
40
Why does the empty string use so many bytes? Does anybody know what is stored in those 40 bytes?
In Python strings are objects, so that values is the size of the object itself. So this size will always be bigger than the string size itself.
From stringobject.h
:
typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
char ob_sval[1];
/* Invariants:
* ob_sval contains space for 'ob_size+1' elements.
* ob_sval[ob_size] == 0.
* ob_shash is the hash of the string or -1 if not computed yet.
* ob_sstate != 0 iff the string object is in stringobject.c's
* 'interned' dictionary; in this case the two references
* from 'interned' to this object are *not counted* in ob_refcnt.
*/
} PyStringObject;
From here you can get some clues on how those bytes are used:
len(str)+1
bytes to store the string itself;You can find some information about the implementation if python strings in a weblog article by Laurent Luce. Additionally you can browse the source.
The size of string objects depends on the OS and type of machine and some choices. On 64-bit FreeBSD, using unicode for string literals (from __future__ import unicode_literals
):
In [1]: dir(str)
Out[1]: ['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__',
'__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__',
'__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split',
'_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode',
'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha',
'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust',
'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']
In [2]: import sys
In [3]: sys.getsizeof("")
Out[3]: 52
In [4]: sys.getsizeof("test")
Out[4]: 68
In [7]: sys.getsizeof("t")
Out[7]: 56
In [8]: sys.getsizeof("te")
Out[8]: 60
In [9]: sys.getsizeof("tes")
Out[9]: 64
Every character uses 4 bytes extra in this case.
It gives the object size of str class with empty value,
when doing such things sys.getsizeof("")
it actually creates a string class object which have many attributes, and then calculate the size of that object.
It is equal to,
x = str()
sys.getsizeof(x) #in my environment it prints 37
Then for each char it takes 1 byte
x = "r"
sys.getsizeof(x) #prints 38
x = "ros"
sys.getsizeof(x) #prints 40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With