Experimenting with magic methods (<code>__sizeof__</code> in particular) on different Python objects I stumbled over the following behaviour: Python 2.7 <pre class="prettyprint"><code>>>> False.__sizeof__() 24 >>> True.__sizeof__() 24 </code></pre> Python 3.x <pre class="prettyprint"><code>>>> False.__sizeof__() 24 >>> True.__sizeof__() 28 </code></pre> What changed in Python 3 that makes the size of <code>True</code> greater than the size of <code>False</code>?

It is because <code>bool</code> is a subclass of <code>int</code> in both Python 2 and 3. <pre class="prettyprint"><code>>>> issubclass(bool, int) True </code></pre> But the <code>int</code> implementation has changed. In Python 2, <code>int</code> was the one that was 32 or 64 bits, depending on the system, as opposed to arbitrary-length <code>long</code>. In Python 3, <code>int</code> is arbitrary-length - the <code>long</code> of Python 2 was renamed to <code>int</code> and the original Python 2 <code>int</code> dropped altogether. <hr> In Python 2 you get the exactly same behaviour for long objects <code>1L</code> and <code>0L</code>: <pre class="prettyprint"><code>Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34) [GCC 7.3.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getsizeof(1L) 28 >>> sys.getsizeof(0L) 24 </code></pre> <hr> The <code>long</code>/Python 3 <code>int</code> is a variable-length object, just like a tuple - when it is allocated, enough memory is allocated to hold all the binary digits required to represent it. The length of the variable part is stored in the object head. <code>0</code> requires no binary digits (its variable length is 0), but even <code>1</code> spills over, and requires extra digits. I.e. <code>0</code> is represented as binary string of length 0: <pre class="prettyprint"><code><> </code></pre> and 1 is represented as a 30-bit binary string: <pre class="prettyprint"><code><000000000000000000000000000001> </code></pre> The default configuration in Python uses 30 bits in a <code>uint32_t</code>; <code>so 2**30 - 1</code> still fits in 28 bytes on x86-64, and <code>2**30</code> will require 32; <code>2**30 - 1</code> will be presented as <pre class="prettyprint"><code><111111111111111111111111111111> </code></pre> i.e. all 30 value bits set to 1; 2**30 will need more, and it will have internal representation <pre class="prettyprint"><code><000000000000000000000000000001000000000000000000000000000000> </code></pre> <hr> As for <code>True</code> using 28 bytes instead of 24 - you need not worry. <code>True</code> is a singleton and therefore only 4 bytes are lost in total in any Python program, not 4 for every usage of <code>True</code>.

Different object size of True and False in Python 3

Tags:

python

python-3.x

python-internals

cpython

python-2.7

Experimenting with magic methods (__sizeof__ in particular) on different Python objects I stumbled over the following behaviour:

Python 2.7

>>> False.__sizeof__() 24 >>> True.__sizeof__() 24

Python 3.x

>>> False.__sizeof__() 24 >>> True.__sizeof__() 28

What changed in Python 3 that makes the size of True greater than the size of False?

865

asked Oct 26 '18 20:10

Simon Fromme

Video Answer

2 Answers

It is because bool is a subclass of int in both Python 2 and 3.

>>> issubclass(bool, int) True

But the int implementation has changed.

In Python 2, int was the one that was 32 or 64 bits, depending on the system, as opposed to arbitrary-length long.

In Python 3, int is arbitrary-length - the long of Python 2 was renamed to int and the original Python 2 int dropped altogether.

In Python 2 you get the exactly same behaviour for long objects 1L and 0L:

Python 2.7.15rc1 (default, Apr 15 2018, 21:51:34)  [GCC 7.3.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getsizeof(1L) 28 >>> sys.getsizeof(0L) 24

The long/Python 3 int is a variable-length object, just like a tuple - when it is allocated, enough memory is allocated to hold all the binary digits required to represent it. The length of the variable part is stored in the object head. 0 requires no binary digits (its variable length is 0), but even 1 spills over, and requires extra digits.

I.e. 0 is represented as binary string of length 0:

<>

and 1 is represented as a 30-bit binary string:

<000000000000000000000000000001>

The default configuration in Python uses 30 bits in a uint32_t; so 2**30 - 1 still fits in 28 bytes on x86-64, and 2**30 will require 32;

2**30 - 1 will be presented as

<111111111111111111111111111111>

i.e. all 30 value bits set to 1; 2**30 will need more, and it will have internal representation

<000000000000000000000000000001000000000000000000000000000000>

As for True using 28 bytes instead of 24 - you need not worry. True is a singleton and therefore only 4 bytes are lost in total in any Python program, not 4 for every usage of True.

144

answered Sep 22 '22 21:09

Antti Haapala -- Слава Україні

Both True and False are longobjects in CPython:

struct _longobject _Py_FalseStruct = {     PyVarObject_HEAD_INIT(&PyBool_Type, 0)     { 0 } };  struct _longobject _Py_TrueStruct = {     PyVarObject_HEAD_INIT(&PyBool_Type, 1)     { 1 } };

You thus can say that a Boolean is a subclass of a python-3.x int where True takes as value 1, and False takes as value 0. We thus make a call to PyVarObject_HEAD_INIT with as type parameter a reference to PyBool_Type and with ob_size as value 0 and 1 respectively.

Now since python-3.x, there is no long anymore: these have been merged, and the int object will, depending on the size of the number, take a different value.

If we inspect the source code of the longlobject type, we see:

/* Long integer representation.    The absolute value of a number is equal to         SUM(for i=0 through abs(ob_size)-1) ob_digit[i] * 2**(SHIFT*i)    Negative numbers are represented with ob_size < 0;    zero is represented by ob_size == 0.    In a normalized number, ob_digit[abs(ob_size)-1] (the most significant    digit) is never zero. Also, in all cases, for all valid i,         0 <= ob_digit[i] <= MASK.    The allocation function takes care of allocating extra memory    so that ob_digit[0] ... ob_digit[abs(ob_size)-1] are actually available.    CAUTION: Generic code manipulating subtypes of PyVarObject has to    aware that ints abuse ob_size's sign bit. */  struct _longobject {     PyObject_VAR_HEAD     digit ob_digit[1]; };

To make a long story short, an _longobject can be seen as an array of "digits", but you should here see digits not as decimal digits, but as groups of bits that thus can be added, multiplied, etc.

Now as is specified in the comment, it says that:

   zero is represented by ob_size == 0.

So in case the value is zero, no digits are added, whereas for small integers (values less than 2³⁰ in CPython), it takes one digit, and so on.

In python-2.x, there were two types of representations for numbers, ints (with a fixed size), you could see this as "one digit", and longs, with multiple digits. Since a bool was a subclass of int, both True and False occupied the same space.

answered Sep 23 '22 21:09

Willem Van Onsem

Related questions
                            
                                spacy Can't find model 'en_core_web_sm' on windows 10 and Python 3.5.3 :: Anaconda custom (64-bit)
                            
                                UnicodeEncodeError: 'ascii' codec can't encode character at special name [duplicate]
                            
                                Is Python slower than Java/C#? [closed]
                            
                                Merge and sum of two dictionaries
                            
                                I can't install pyaudio on Windows? How to solve "error: Microsoft Visual C++ 14.0 is required."?
                            
                                Short rot13 function - Python [closed]
                            
                                Python - what are all the built-in decorators? [closed]
                            
                                What's the difference between namedtuple and NamedTuple?
                            
                                what does --enable-optimizations do while compiling python?
                            
                                How to set the value of dataclass field in __post_init__ when frozen=True?
                            
                                What is the most efficient graph data structure in Python? [closed]
                            
                                set pythonpath before import statements
                            
                                Direct assignment to the forward side of a many-to-many set is prohibited. Use emails_for_help.set() instead
                            
                                Should I ignore the .idea folder when using PyCharm with Git?
                            
                                Understanding Popen.communicate
                            
                                dynamically add field to a form
                            
                                Trying to import module with the same name as a built-in module causes an import error
                            
                                python argh/argparse: How can I pass a list as a command-line argument?
                            
                                sys.path different in Jupyter and Python - how to import own modules in Jupyter?
                            
                                Mixing categorial and continuous data in Naive Bayes classifier using scikit-learn

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With