Python

Question

I have a list l and non-negative integers n, start0, start1.

I need most efficient (fastest, high-speed) way of copying (moving) l[start0 : start0 + n] to l[start1 : start1 + n].

I need a shallow copy (only copying references) for my current case, but would be interesting how to make fast a deep copy of range too.

Is next the most efficient and shortest way?

l[start1 : start1 + n] = l[start0 : start0 + n]

As left and right sides have same range length n in theory Python can do optimizations by just copying this data without creating any intermediate lists or resizing/re-allocating original list. But does python do such optimizations?

I've tried to do next measurements. I'm creating lists of different lengths while copying just short sub-range, also doing three variants - copying same-length sub-range, shorter sub-range, and longer sub-range. In theory for same length range python can just copy short data, which is very efficient, while for shorter or longer subranges python should move and/or re-allocate whole large array.

Try it online!

# Needs: python -m pip install timerit
import math, timerit

for lsize in range(5, 27, 3):
    size = 1 << lsize
    for n in [10, 9, 11]:
        print(f'size = {str(size).rjust(8)}, n = {str(n).rjust(2)}: ', end = '', flush = True)
        for t in timerit.Timerit(num = math.ceil(10 ** 5 / math.sqrt(size)), verbose = 1):
            l = [0] * size
            with t:
                l[10 : 20] = l[0 : n]

outputs:

size =       32, n = 10: Timed best=1.026 µs, mean=1.464 ± 0.2 µs
size =       32, n =  9: Timed best=1.026 µs, mean=1.528 ± 0.2 µs
size =       32, n = 11: Timed best=1.539 µs, mean=1.544 ± 0.1 µs
size =      256, n = 10: Timed best=1.026 µs, mean=1.532 ± 0.1 µs
size =      256, n =  9: Timed best=1.539 µs, mean=1.546 ± 0.1 µs
size =      256, n = 11: Timed best=1.539 µs, mean=2.172 ± 0.2 µs
size =     2048, n = 10: Timed best=1.539 µs, mean=1.585 ± 0.1 µs
size =     2048, n =  9: Timed best=2.565 µs, mean=2.580 ± 0.1 µs
size =     2048, n = 11: Timed best=4.105 µs, mean=4.375 ± 0.3 µs
size =    16384, n = 10: Timed best=2.052 µs, mean=2.125 ± 0.2 µs
size =    16384, n =  9: Timed best=11.802 µs, mean=12.107 ± 0.3 µs
size =    16384, n = 11: Timed best=12.315 µs, mean=12.874 ± 0.7 µs
size =   131072, n = 10: Timed best=3.592 µs, mean=3.907 ± 0.2 µs
size =   131072, n =  9: Timed best=96.988 µs, mean=100.387 ± 1.4 µs
size =   131072, n = 11: Timed best=851.852 µs, mean=873.135 ± 13.3 µs
size =  1048576, n = 10: Timed best=9.750 µs, mean=10.434 ± 0.3 µs
size =  1048576, n =  9: Timed best=1.119 ms, mean=1.127 ± 0.0 ms
size =  1048576, n = 11: Timed best=6.986 ms, mean=7.054 ± 0.0 ms
size =  8388608, n = 10: Timed best=10.263 µs, mean=10.862 ± 0.4 µs
size =  8388608, n =  9: Timed best=9.752 ms, mean=9.813 ± 0.0 ms
size =  8388608, n = 11: Timed best=58.158 ms, mean=58.692 ± 0.4 ms
size = 67108864, n = 10: Timed best=12.829 µs, mean=13.548 ± 0.4 µs
size = 67108864, n =  9: Timed best=78.876 ms, mean=79.061 ± 0.1 ms
size = 67108864, n = 11: Timed best=471.014 ms, mean=473.688 ± 1.9 ms

As we can see indeed for same-length (10) ranges for long lists it takes microseconds to copy, while shorter (9) / longer (11) ranges copying takes milliseconds. Also copying same-length for very short and very long lists take not-very-different times (meaning not that different like thousands times as for case-9 and case-11). Also replacing range with shorter range is significantly faster than replacing range with longer range, because first needs just to copy whole array's data, while second needs also to re-allocate array as it needs a larger array to store new data.

Measurements above I did intentionally just to check that Python is clever enough to not re-create whole array when same-length-to-same-length is copied. Because next non-clever naive implementation of Python's list copying can also be possible - 1) create new temporary array with copy of content of l[0 : start0], 2) create new temporary array with copy of content of l[start1 : start1 + n1], 3) create new temporary array with copy of content of l[start0 : start0 + n0], 4) concatenate these 1)-3) into new large array. Of cause I was almost sure python doesn't do that, but just to re-check I did measurements. And of cause only n elements are copied only for the case when slice expression on right has same length as on left. If left and right differ in size than half (on average) or whole array data should be copied, as it was shown by measurements of cases 9 and 11.

So seems that Python actually does optimizations for same-length ranges (meaning that whole original list is not re-allocated from scratch, only range values (references) are copied). Any documentational proofs?

Maybe also there is some even more optimal builtin function for doing copy of same-length ranges? Like l.copy_range(start0, n, start1). Same like in C++ exists std::copy(vec.begin() + start0, vec.begin() + start0 + n, vec.begin() + start1) same way Python may have built-in C-based method for such copying of references.

Also how about step? For doing copy of [start0 : start0 + n : step].

Next is also a way to copy range:

for i in range(n):
    l[sart1 + i] = l[sart0 + i]

but probably not as efficient as first method.

So what's the most efficient way to copy a range from one place in list to another place in same list?

PS. BTW, for my specific task I need not even to copy range but to move range from one place to another, i.e. old region can be None-filled. If built-in function for moving region existed in Python then for python it would be just enough to use C-function like memmove() and zeroize old region, no need to copy or increment/decrement ref-counts of entries.

Jérôme Richard · Accepted Answer

TL;DR: The CPython interpreter creates a shallow copy of slice of l and then perform a shallow copy.

Lists are basically an array of object reference. A shallow copy of a list is a memory array containing only object reference (eg. pointers) to objects shared with other lists (eg. at least the copied one just after the slice has been created). A deep copy of a list is is a memory array containing object reference to object copies newly created from the source list. You can find more information about Python shallow/deep copies here, here or there.

This means that when you write l[0:c], a new array is created in memory referencing c object in the list l. You can see that by building huge lists and look at memory usage and execution time:

# CPython process memory: < 50 Mo
a = [0 for i in range(256*1024*1024)]  # time: 5.45 s
# CPython process memory: ~2 Go
# Note that all references refer to the same immutable object 0 and each reference (memory pointer) takes 8 byte on my 64-bit machine.
tmp = a[0:128*1024*1024]  # time: 372 ms
# CPython process memory: ~3 Go
# tmp takes 1 Go here because of the 128*1024*1024 8-byte references
del tmp  # time: 219 ms
# CPython process memory: ~2 Go
# The internal reference memory array of tmp has been freed
a[0:128*1024*1024] = a[128*1024*1024:256*1024*1024]  # time: 1.30 s
# CPython process memory: ~2 Go
# CPython allocate 1 Go of additional memory during a fraction of time and then free it after the shallow copy has been made.

Playing with many immutable objects is also interesting (note that the amount of memory used is not very precise here):

# CPython process memory: < 50 Mo
%time a = [i for i in range(64*1024*1024)]
# CPython process memory: ~2.6 Go
# Additionally to references, a lot of integer object have to be created (much heavier than reference)
%time tmp = a[0:32*1024*1024]
# CPython process memory: ~2.8 Go
# No integer objects are copied but reference are.
%time del tmp
# CPython process memory: ~2.6 Go
# References have been freed.
%time a[0:32*1024*1024] = a[32*1024*1024:64*1024*1024]
# CPython process memory: ~1.5 Go
# Only the object references are copied, not the integer objects.
# Half the integer objects have been freed because they are not referenced anymore. Each integer object is referenced twice in the array.

Thus, no specific optimizations are performed by CPython in your case: the interpreter creates a shallow copy of slice of the list and then perform a shallow copy (and resize the target list if the size of the two slices is different).

AFAIK there is no way to create a Python slice without a copy. One can use generators/loops to avoid a copy but this is slow. I am not aware of a faster way to perform a list copy. One solution is to move to Numpy as it has better memory views (more compact in memory):

a = np.ones(256*1024*1024, dtype=np.int32)  # time: 515 ms
# CPython process memory: ~1 Go
# No reference are created, just a raw memory array of 32-bit integers.
tmp = a[0:128*1024*1024]  # time: 24 ms
# CPython process memory: ~1 Go
# No shallow copy is created, just a (light) view object.
del tmp  # time: 21 ms
# CPython process memory: ~1 Go
a[0:128*1024*1024] = a[128*1024*1024:256*1024*1024]  # time: 94 ms
# CPython process memory: ~1 Go
# Integers are copied (as there is no references).

Note that Numpy arrays can also contain Python object although this not very efficient. It should be still faster than CPython list if slices are big enough. Additionally, note that you can perform a deep copy of a list or any Python object with copy.deepcopy(...).

Python - efficient copying of list sub-range

Tags:

performance

list

copy

range

Arty

1 Answers

Jérôme Richard

Recent Activity

Donate For Us