Best and/or fastest way to create lists in python

People also ask

What is faster than Python list?

Tuples are stored in a single block of memory. Tuples are immutable so, It doesn't require extra space to store new objects. Lists are allocated in two blocks: the fixed one with all the Python object information and a variable sized block for the data. It is the reason creating a tuple is faster than List.

Which is the faster method on a list append or insert?

Insert is slower when compared to append.

Let's run some time tests* with timeit.timeit:

>>> from timeit import timeit
>>>
>>> # Test 1
>>> test = """
... my_list = []
... for i in xrange(50):
...     my_list.append(0)
... """
>>> timeit(test)
22.384258893239178
>>>
>>> # Test 2
>>> test = """
... my_list = []
... for i in xrange(50):
...     my_list += [0]
... """
>>> timeit(test)
34.494779364416445
>>>
>>> # Test 3
>>> test = "my_list = [0 for i in xrange(50)]"
>>> timeit(test)
9.490926919482774
>>>
>>> # Test 4
>>> test = "my_list = [0] * 50"
>>> timeit(test)
1.5340533503559755
>>>

As you can see above, the last method is the fastest by far.

However, it should only be used with immutable items (such as integers). This is because it will create a list with references to the same item.

Below is a demonstration:

>>> lst = [[]] * 3
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are the same
>>> id(lst[0])
28734408
>>> id(lst[1])
28734408
>>> id(lst[2])
28734408
>>>

This behavior is very often undesirable and can lead to bugs in the code.

If you have mutable items (such as lists), then you should use the still very fast list comprehension:

>>> lst = [[] for _ in xrange(3)]
>>> lst
[[], [], []]
>>> # The ids of the items in `lst` are different
>>> id(lst[0])
28796688
>>> id(lst[1])
28796648
>>> id(lst[2])
28736168
>>>

*Note: In all of the tests, I replaced range with xrange. Since the latter returns an iterator, it should always be faster than the former.

If you want to see the dependency with the length of the list n:

Pure python

enter image description here

I tested for list length up to n=10000 and the behavior remains the same. So the integer multiplication method is the fastest with difference.

Numpy

For lists with more than ~300 elements you should consider numpy.

enter image description here

Benchmark code:

import time

def timeit(f):

    def timed(*args, **kwargs):
        start = time.clock()
        for _ in range(100):
            f(*args, **kwargs)
        end = time.clock()
        return end - start
    return timed

@timeit
def append_loop(n):
    """Simple loop with append"""
    my_list = []
    for i in xrange(n):
        my_list.append(0)

@timeit
def add_loop(n):
    """Simple loop with +="""
    my_list = []
    for i in xrange(n):
        my_list += [0]

@timeit   
def list_comprehension(n):        
    """List comprehension"""
    my_list = [0 for i in xrange(n)]

@timeit
def integer_multiplication(n):
    """List and integer multiplication"""
    my_list = [0] * n


import numpy as np

@timeit
def numpy_array(n):
    my_list = np.zeros(n)
    

import pandas as pd 

df = pd.DataFrame([(integer_multiplication(n), numpy_array(n)) for n in range(1000)], 
                  columns=['Integer multiplication', 'Numpy array'])
df.plot()

Gist here.

There is one more method which, while sounding weird, is handy in right curcumstances. If you need to produce the same list many times (initializing matrix for roguelike pathfinding and related stuff in my case), you can store a copy of the list in the tuple, then turn it to list when you need it. It is noticeably quicker than generating list via comprehensions and, unlike list multiplication, works with nested data structures.

#  In class definition
def __init__(self):
    self.l = [[1000 for x in range(1000)] for y in range(1000)]
    self.t = tuple(self.l)

def some_method(self):
    self.l = list(self.t)
    self._do_fancy_computation()
    #  self.l is changed by this method

#  Later in code:
for a in range(10):
    obj.some_method()

Voila, on every iteration you have a fresh copy of the same list in no time!

Disclaimer:

I do not have a slightest idea why is this so quick or whether it works anywhere outside CPython 3.4.

Related questions
                            
                                What is :: (double colon) in numpy like in myarray[0::3]? [duplicate]
                            
                                Python: How can I execute a jar file through a python script
                            
                                How to change folder names in python?
                            
                                What are the Spark transformations that causes a Shuffle?
                            
                                How to determine if an exception was raised once you're in the finally block?
                            
                                Tracing and Returning a Path in Depth First Search
                            
                                Is the PySide Slot Decorator Necessary?
                            
                                When using a pandas dataframe, how do I add column if does not exist?
                            
                                How to run a Python script in a '.py' file from a Google Colab notebook?
                            
                                TypeError: string argument without an encoding
                            
                                Extracting code from photograph of T-shirt via OCR
                            
                                django-cors-headers not work
                            
                                Absolute value for column in Python
                            
                                Complete set of punctuation marks for Python (not just ASCII)
                            
                                python subclass access to class variable of parent
                            
                                Lexical cast from string to type
                            
                                Putting arrowheads on vectors in matplotlib's 3d plot
                            
                                How do I create a sum row and sum column in pandas?
                            
                                MSSQL in python 2.7
                            
                                pip: how to install a git pull request

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Best and/or fastest way to create lists in python

Tags:

python

list

People also ask

Pure python

Numpy

Disclaimer:

Recent Activity

Donate For Us