Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copy performance: list vs array

I was optimizing a piece of code and figured out that list copying (shallow) was the bottleneck.

Now I am curious: why is list copying so much slower than copying an array of 8-bytes? In my opinion there should be no difference. In both cases it should be just memcpy(dst, src, sizeof(int64_t)*len(src)) since pointers are 8 bytes long. But apparently Python does more work than I expected. Is it somehow related to GC? Or is it possible that lists are implemented as linked lists?

import array
import numpy as np
import timeit


n = 100*1000
lst = [i for i in range(n)]
arr = array.array('q', lst)
nmp = np.array(arr, dtype=np.int64)

assert(arr.itemsize == 8)

n_iter = 100000
print('=== copy() ===')
print('List of int:', timeit.timeit(stmt='lst.copy()', setup='from __main__ import lst', number=n_iter))
print('Array of 8-bytes:', timeit.timeit(stmt='arr.__copy__()', setup='from __main__ import arr', number=n_iter))
print('Numpy array of int64:', timeit.timeit(stmt='nmp.copy()', setup='from __main__ import nmp', number=n_iter))

The results:

=== copy() ===
List of int: 27.434935861998383
Array of 8-bytes: 2.6839109230022586
Numpy array of int64: 2.69919407800262
like image 508
gudok Avatar asked Feb 23 '20 15:02

gudok


People also ask

Is a list faster than an array?

An array is faster and that is because ArrayList uses a fixed amount of array. However when you add an element to the ArrayList and it overflows. It creates a new Array and copies every element from the old one to the new one. List over arrays.

Is it better to use lists or arrays?

Arrays can store data very compactly and are more efficient for storing large amounts of data. Arrays are great for numerical operations; lists cannot directly handle math operations. For example, you can divide each element of an array by the same number with just one line of code.

Which is faster in Python list or array?

An array is faster than a list in python since all the elements stored in an array are homogeneous i.e., they have the same data type whereas a list contains heterogeneous elements. Moreover, Python arrays are implemented in C which makes it a lot faster than lists that are built-in in Python itself.

Is list slower than array?

Slower Search Time:Linked list have slower search times than arrays as random access is not allowed. Unlike arrays where the elements can be search by index, linked list require iteration.


1 Answers

With the list, it's not just about copying references to the objects. It's also increasing the reference counters inside the objects. Plus those aren't cache-friendly, since they're not nicely adjacent to each other because they're inside objects (along with other data of the objects) and those objects are somewhere on the heap. Also see Why is copying a shuffled list much slower? (you have the faster "unshuffled" case here, but the explanations there might still be useful).

like image 91
Kelly Bundy Avatar answered Sep 30 '22 18:09

Kelly Bundy