What is the fastest (to access) struct-like object in Python?

Tags:

I'm optimizing some code whose main bottleneck is running through and accessing a very large list of struct-like objects. Currently I'm using namedtuples, for readability. But some quick benchmarking using 'timeit' shows that this is really the wrong way to go where performance is a factor:

Named tuple with a, b, c:

>>> timeit("z = a.c", "from __main__ import a") 0.38655471766332994

Class using __slots__, with a, b, c:

>>> timeit("z = b.c", "from __main__ import b") 0.14527461047146062

Dictionary with keys a, b, c:

>>> timeit("z = c['c']", "from __main__ import c") 0.11588272541098377

Tuple with three values, using a constant key:

>>> timeit("z = d[2]", "from __main__ import d") 0.11106188992948773

List with three values, using a constant key:

>>> timeit("z = e[2]", "from __main__ import e") 0.086038238242508669

Tuple with three values, using a local key:

>>> timeit("z = d[key]", "from __main__ import d, key") 0.11187358437882722

List with three values, using a local key:

>>> timeit("z = e[key]", "from __main__ import e, key") 0.088604143037173344

First of all, is there anything about these little timeit tests that would render them invalid? I ran each several times, to make sure no random system event had thrown them off, and the results were almost identical.

It would appear that dictionaries offer the best balance between performance and readability, with classes coming in second. This is unfortunate, since, for my purposes, I also need the object to be sequence-like; hence my choice of namedtuple.

Lists are substantially faster, but constant keys are unmaintainable; I'd have to create a bunch of index-constants, i.e. KEY_1 = 1, KEY_2 = 2, etc. which is also not ideal.

Am I stuck with these choices, or is there an alternative that I've missed?

824

asked Apr 15 '10 14:04

DNS

2 Answers

One thing to bear in mind is that namedtuples are optimised for access as tuples. If you change your accessor to be a[2] instead of a.c, you'll see similar performance to the tuples. The reason is that the name accessors are effectively translating into calls to self[idx], so pay both the indexing and the name lookup price.

If your usage pattern is such that access by name is common, but access as tuple isn't, you could write a quick equivalent to namedtuple that does things the opposite way: defers index lookups to access by-name. However, you'll pay the price on the index lookups then. Eg here's a quick implementation:

def makestruct(name, fields):     fields = fields.split()     import textwrap     template = textwrap.dedent("""\     class {name}(object):         __slots__ = {fields!r}         def __init__(self, {args}):             {self_fields} = {args}         def __getitem__(self, idx):              return getattr(self, fields[idx])     """).format(         name=name,         fields=fields,         args=','.join(fields),          self_fields=','.join('self.' + f for f in fields))     d = {'fields': fields}     exec template in d     return d[name]

But the timings are very bad when __getitem__ must be called:

namedtuple.a  :  0.473686933517  namedtuple[0] :  0.180409193039 struct.a      :  0.180846214294 struct[0]     :  1.32191514969

ie, the same performance as a __slots__ class for attribute access (unsurprisingly - that's what it is), but huge penalties due to the double lookup in index-based accesses. (Noteworthy is that __slots__ doesn't actually help much speed-wise. It saves memory, but the access time is about the same without them.)

One third option would be to duplicate the data, eg. subclass from list and store the values both in the attributes and listdata. However you don't actually get list-equivalent performance. There's a big speed hit just in having subclassed (bringing in checks for pure-python overloads). Thus struct[0] still takes around 0.5s (compared with 0.18 for raw list) in this case, and you do double the memory usage, so this may not be worth it.

112

answered Oct 11 '22 05:10

Brian

This question is fairly old (internet-time), so I thought I'd try duplicating your test today, both with regular CPython (2.7.6), and with pypy (2.2.1) and see how the various methods compared. (I also added in an indexed lookup for the named tuple.)

This is a bit of a micro-benchmark, so YMMV, but pypy seemed to speed up named tuple access by a factor of 30 vs CPython (whereas dictionary access was only sped up by a factor of 3).

from collections import namedtuple  STest = namedtuple("TEST", "a b c") a = STest(a=1,b=2,c=3)  class Test(object):     __slots__ = ["a","b","c"]      a=1     b=2     c=3  b = Test()  c = {'a':1, 'b':2, 'c':3}  d = (1,2,3) e = [1,2,3] f = (1,2,3) g = [1,2,3] key = 2  if __name__ == '__main__':     from timeit import timeit      print("Named tuple with a, b, c:")     print(timeit("z = a.c", "from __main__ import a"))      print("Named tuple, using index:")     print(timeit("z = a[2]", "from __main__ import a"))      print("Class using __slots__, with a, b, c:")     print(timeit("z = b.c", "from __main__ import b"))      print("Dictionary with keys a, b, c:")     print(timeit("z = c['c']", "from __main__ import c"))      print("Tuple with three values, using a constant key:")         print(timeit("z = d[2]", "from __main__ import d"))      print("List with three values, using a constant key:")     print(timeit("z = e[2]", "from __main__ import e"))      print("Tuple with three values, using a local key:")     print(timeit("z = d[key]", "from __main__ import d, key"))      print("List with three values, using a local key:")     print(timeit("z = e[key]", "from __main__ import e, key"))

Python Results:

Named tuple with a, b, c: 0.124072679784 Named tuple, using index: 0.0447055962367 Class using __slots__, with a, b, c: 0.0409136944224 Dictionary with keys a, b, c: 0.0412045334915 Tuple with three values, using a constant key: 0.0449477955531 List with three values, using a constant key: 0.0331083467148 Tuple with three values, using a local key: 0.0453569025139 List with three values, using a local key: 0.033030056702

PyPy Results:

Named tuple with a, b, c: 0.00444889068604 Named tuple, using index: 0.00265598297119 Class using __slots__, with a, b, c: 0.00208616256714 Dictionary with keys a, b, c: 0.013897895813 Tuple with three values, using a constant key: 0.00275301933289 List with three values, using a constant key: 0.002760887146 Tuple with three values, using a local key: 0.002769947052 List with three values, using a local key: 0.00278806686401

answered Oct 11 '22 05:10

Gerrat

Related questions
                            
                                Why does x,y = zip(*zip(a,b)) work in Python?
                            
                                numpy: formal definition of "array_like" objects?
                            
                                What exactly should be set in PYTHONPATH?
                            
                                What's the difference between "pip install" and "python -m pip install"?
                            
                                How to pickle yourself?
                            
                                How to convert an XML file to nice pandas dataframe?
                            
                                list() uses slightly more memory than list comprehension
                            
                                Why True/False is capitalized in Python?
                            
                                Mesh grid functions in Python (meshgrid mgrid ogrid ndgrid)
                            
                                Tkinter scrollbar for frame
                            
                                Python decoding Unicode is not supported
                            
                                Why does foo = filter(...) return a <filter object>, not a list? [duplicate]
                            
                                Django Query That Get Most Recent Objects From Different Categories
                            
                                Iterate enum in definition order in Python 2
                            
                                What's the difference between __builtin__ and __builtins__?
                            
                                Why use lambda functions?
                            
                                Is there a Java equivalent of Python's 'enumerate' function?
                            
                                How can I set the last modified time of a file from python?
                            
                                Use list comprehension to build a tuple
                            
                                Get the nth item of a generator in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the fastest (to access) struct-like object in Python?

Tags:

performance

python

data-structures

DNS

People also ask

2 Answers

Brian

Gerrat

Recent Activity

Donate For Us