Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I determine the size of an object in Python?

I want to know how to get size of objects like a string, integer, etc. in Python.

Related question: How many bytes per element are there in a Python list (tuple)?

I am using an XML file which contains size fields that specify the size of value. I must parse this XML and do my coding. When I want to change the value of a particular field, I will check the size field of that value. Here I want to compare whether the new value that I'm gong to enter is of the same size as in XML. I need to check the size of new value. In case of a string I can say its the length. But in case of int, float, etc. I am confused.

like image 878
user46646 Avatar asked Jan 16 '09 05:01

user46646


People also ask

How do you find the size of an object in Python?

In python, the usage of sys. getsizeof() can be done to find the storage size of a particular object that occupies some space in the memory. This function returns the size of the object in bytes.

How do you use the size operator in Python?

The __sizeof__() function in Python doesn't exactly tell us the size of the object. It doesn't return the size of a generator object as Python cannot tell us beforehand that how much size of a generator is. Still, in actuality, it returns the internal size for a particular object (in bytes) occupying the memory.

What does __ sizeof __ do in Python?

Now let's look at the __sizeof__() method. It returns the size of the object without any overhead.

How do you find the size of the object of the class?

One way to get an estimate of an object's size in Java is to use getObjectSize(Object) method of the Instrumentation interface introduced in Java 5. As we could see in Javadoc documentation, the method provides “implementation-specific approximation” of the specified object's size.


2 Answers

Just use the sys.getsizeof function defined in the sys module.

sys.getsizeof(object[, default]):

Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.

Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to.

The default argument allows to define a value which will be returned if the object type does not provide means to retrieve the size and would cause a TypeError.

getsizeof calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

See recursive sizeof recipe for an example of using getsizeof() recursively to find the size of containers and all their contents.

Usage example, in python 3.0:

>>> import sys >>> x = 2 >>> sys.getsizeof(x) 24 >>> sys.getsizeof(sys.getsizeof) 32 >>> sys.getsizeof('this') 38 >>> sys.getsizeof('this also') 48 

If you are in python < 2.6 and don't have sys.getsizeof you can use this extensive module instead. Never used it though.

like image 183
nosklo Avatar answered Oct 06 '22 10:10

nosklo


How do I determine the size of an object in Python?

The answer, "Just use sys.getsizeof", is not a complete answer.

That answer does work for builtin objects directly, but it does not account for what those objects may contain, specifically, what types, such as custom objects, tuples, lists, dicts, and sets contain. They can contain instances each other, as well as numbers, strings and other objects.

A More Complete Answer

Using 64-bit Python 3.6 from the Anaconda distribution, with sys.getsizeof, I have determined the minimum size of the following objects, and note that sets and dicts preallocate space so empty ones don't grow again until after a set amount (which may vary by implementation of the language):

Python 3:

Empty Bytes  type        scaling notes 28     int         +4 bytes about every 30 powers of 2 37     bytes       +1 byte per additional byte 49     str         +1-4 per additional character (depending on max width) 48     tuple       +8 per additional item 64     list        +8 for each additional 224    set         5th increases to 736; 21nd, 2272; 85th, 8416; 341, 32992 240    dict        6th increases to 368; 22nd, 1184; 43rd, 2280; 86th, 4704; 171st, 9320 136    func def    does not include default args and other attrs 1056   class def   no slots  56     class inst  has a __dict__ attr, same scaling as dict above 888    class def   with slots 16     __slots__   seems to store in mutable tuple-like structure                    first slot grows to 48, and so on. 

How do you interpret this? Well say you have a set with 10 items in it. If each item is 100 bytes each, how big is the whole data structure? The set is 736 itself because it has sized up one time to 736 bytes. Then you add the size of the items, so that's 1736 bytes in total

Some caveats for function and class definitions:

Note each class definition has a proxy __dict__ (48 bytes) structure for class attrs. Each slot has a descriptor (like a property) in the class definition.

Slotted instances start out with 48 bytes on their first element, and increase by 8 each additional. Only empty slotted objects have 16 bytes, and an instance with no data makes very little sense.

Also, each function definition has code objects, maybe docstrings, and other possible attributes, even a __dict__.

Also note that we use sys.getsizeof() because we care about the marginal space usage, which includes the garbage collection overhead for the object, from the docs:

getsizeof() calls the object’s __sizeof__ method and adds an additional garbage collector overhead if the object is managed by the garbage collector.

Also note that resizing lists (e.g. repetitively appending to them) causes them to preallocate space, similarly to sets and dicts. From the listobj.c source code:

    /* This over-allocates proportional to the list size, making room      * for additional growth.  The over-allocation is mild, but is      * enough to give linear-time amortized behavior over a long      * sequence of appends() in the presence of a poorly-performing      * system realloc().      * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...      * Note: new_allocated won't overflow because the largest possible value      *       is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.      */     new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6); 

Historical data

Python 2.7 analysis, confirmed with guppy.hpy and sys.getsizeof:

Bytes  type        empty + scaling notes 24     int         NA 28     long        NA 37     str         + 1 byte per additional character 52     unicode     + 4 bytes per additional character 56     tuple       + 8 bytes per additional item 72     list        + 32 for first, 8 for each additional 232    set         sixth item increases to 744; 22nd, 2280; 86th, 8424 280    dict        sixth item increases to 1048; 22nd, 3352; 86th, 12568 * 120    func def    does not include default args and other attrs 64     class inst  has a __dict__ attr, same scaling as dict above 16     __slots__   class with slots has no dict, seems to store in                      mutable tuple-like structure. 904    class def   has a proxy __dict__ structure for class attrs 104    old class   makes sense, less stuff, has real dict though. 

Note that dictionaries (but not sets) got a more compact representation in Python 3.6

I think 8 bytes per additional item to reference makes a lot of sense on a 64 bit machine. Those 8 bytes point to the place in memory the contained item is at. The 4 bytes are fixed width for unicode in Python 2, if I recall correctly, but in Python 3, str becomes a unicode of width equal to the max width of the characters.

And for more on slots, see this answer.

A More Complete Function

We want a function that searches the elements in lists, tuples, sets, dicts, obj.__dict__'s, and obj.__slots__, as well as other things we may not have yet thought of.

We want to rely on gc.get_referents to do this search because it works at the C level (making it very fast). The downside is that get_referents can return redundant members, so we need to ensure we don't double count.

Classes, modules, and functions are singletons - they exist one time in memory. We're not so interested in their size, as there's not much we can do about them - they're a part of the program. So we'll avoid counting them if they happen to be referenced.

We're going to use a blacklist of types so we don't include the entire program in our size count.

import sys from types import ModuleType, FunctionType from gc import get_referents  # Custom objects know their class. # Function objects seem to know way too much, including modules. # Exclude modules as well. BLACKLIST = type, ModuleType, FunctionType   def getsize(obj):     """sum size of object & members."""     if isinstance(obj, BLACKLIST):         raise TypeError('getsize() does not take argument of type: '+ str(type(obj)))     seen_ids = set()     size = 0     objects = [obj]     while objects:         need_referents = []         for obj in objects:             if not isinstance(obj, BLACKLIST) and id(obj) not in seen_ids:                 seen_ids.add(id(obj))                 size += sys.getsizeof(obj)                 need_referents.append(obj)         objects = get_referents(*need_referents)     return size 

To contrast this with the following whitelisted function, most objects know how to traverse themselves for the purposes of garbage collection (which is approximately what we're looking for when we want to know how expensive in memory certain objects are. This functionality is used by gc.get_referents.) However, this measure is going to be much more expansive in scope than we intended if we are not careful.

For example, functions know quite a lot about the modules they are created in.

Another point of contrast is that strings that are keys in dictionaries are usually interned so they are not duplicated. Checking for id(key) will also allow us to avoid counting duplicates, which we do in the next section. The blacklist solution skips counting keys that are strings altogether.

Whitelisted Types, Recursive visitor

To cover most of these types myself, instead of relying on the gc module, I wrote this recursive function to try to estimate the size of most Python objects, including most builtins, types in the collections module, and custom types (slotted and otherwise).

This sort of function gives much more fine-grained control over the types we're going to count for memory usage, but has the danger of leaving important types out:

import sys from numbers import Number from collections import deque from collections.abc import Set, Mapping   ZERO_DEPTH_BASES = (str, bytes, Number, range, bytearray)   def getsize(obj_0):     """Recursively iterate to sum size of object & members."""     _seen_ids = set()     def inner(obj):         obj_id = id(obj)         if obj_id in _seen_ids:             return 0         _seen_ids.add(obj_id)         size = sys.getsizeof(obj)         if isinstance(obj, ZERO_DEPTH_BASES):             pass # bypass remaining control flow and return         elif isinstance(obj, (tuple, list, Set, deque)):             size += sum(inner(i) for i in obj)         elif isinstance(obj, Mapping) or hasattr(obj, 'items'):             size += sum(inner(k) + inner(v) for k, v in getattr(obj, 'items')())         # Check for custom object instances - may subclass above too         if hasattr(obj, '__dict__'):             size += inner(vars(obj))         if hasattr(obj, '__slots__'): # can have __slots__ with __dict__             size += sum(inner(getattr(obj, s)) for s in obj.__slots__ if hasattr(obj, s))         return size     return inner(obj_0) 

And I tested it rather casually (I should unittest it):

>>> getsize(['a', tuple('bcd'), Foo()]) 344 >>> getsize(Foo()) 16 >>> getsize(tuple('bcd')) 194 >>> getsize(['a', tuple('bcd'), Foo(), {'foo': 'bar', 'baz': 'bar'}]) 752 >>> getsize({'foo': 'bar', 'baz': 'bar'}) 400 >>> getsize({}) 280 >>> getsize({'foo':'bar'}) 360 >>> getsize('foo') 40 >>> class Bar(): ...     def baz(): ...         pass >>> getsize(Bar()) 352 >>> getsize(Bar().__dict__) 280 >>> sys.getsizeof(Bar()) 72 >>> getsize(Bar.__dict__) 872 >>> sys.getsizeof(Bar.__dict__) 280 

This implementation breaks down on class definitions and function definitions because we don't go after all of their attributes, but since they should only exist once in memory for the process, their size really doesn't matter too much.

like image 28
Russia Must Remove Putin Avatar answered Oct 06 '22 11:10

Russia Must Remove Putin