Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Execution time difference between x += y and x = x + y [duplicate]

I was trying to submit my solution to a leetcode problem wherein x and y are lists and using

x = x + y

gave me a Time limit exceeded whereas using

x += y

passed the test cases and gave me AC.

What is the execution time difference between the two and the difference in the way both are executed?

like image 598
AbHiNaV AgRaWaL Avatar asked Jun 28 '21 19:06

AbHiNaV AgRaWaL


Video Answer


2 Answers

For list objects,

temp = temp + [] 

Creates a new list, and takes linear time on the size of the resulting list (it scales linearly). Importantly, it re-creates the entire new list. If done in a loop, e.g.

x = []

for i in range(N):
    x = x + [i]

The entire algorithm is quadratic time, O(N^2)

On the other hand, temp += [] works in-place. It does not create a new list. It is O(K) where K is the size of the list on the right, i.e. the number of elements added. This works this way because python list objects are implemented as array-lists which overallocate so you don't have to reallocate each time the list increases in size. Simply put, this means that appending an item to the end of the list is amortized constant time. Importantly, this makes:

x = []

for i in range(N):
    x += [i]

linear time, i.e. O(N).

To see this behavior empirically, you could use the following script:

import pandas as pd
import matplotlib.pyplot as plt
import time

def concatenate(N):
    result = []
    for i in range(N):
        result = result + [i]

def inplace(N):
    result = []
    for i in range(N):
        result += [i]


def time_func(N, f):
    start = time.perf_counter()
    f(N)
    stop = time.perf_counter()
    return stop - start

NS = range(0, 100_001, 10_000)
inplc = [time_func(n, inplace) for n in NS]
concat = [time_func(n, concatenate) for n in NS]

df = pd.DataFrame({"in-place":inplc, "concat": concat}, index=NS)

df.plot()
plt.savefig('in-place-vs-new-list-loop.png')

enter image description here

Notice, at a N == 100_000, the concatenation version is taking over 10 seconds, whereas the in-place extend version takes 0.01 seconds... so it's several orders of magnitude slower, and the difference will keep growing dramatically (i.e. quadratically) as N increases.

To understand this behavior, here is an informal treatment of the time complexity:

For concat, at each iteration, x = x + [i] takes i amount of work, where i is the length of the resulting array. So the whole loop will be 0 + 1 + 2 + 3 + ... + N. Now, using the handy formula for the Nth partial sum of this well-known series the loop will require N*(N+1)/2 total work.

N*(N + 1) / 2 == N^2/2 + N/2 which is simply O(N^2)

On the other hand, the in-place extend version, on each iteration,

temp += [i]

Requires only 1 (constant) amount of work. So for the whole loop, it's just

1 + 1 + ... + 1 (N times)

So N total amount of work, so it is O(N)

like image 56
juanpa.arrivillaga Avatar answered Oct 17 '22 20:10

juanpa.arrivillaga


The expression a = a + b does the following:

  1. Allocate a new list big enough to hold both a and b.
  2. Copy a to the new buffer.
  3. Copy b to the new buffer.
  4. Bind the name a to the new buffer (which is what list.__add__ returns).

The allocation and copies are inevitable in this case, regardless of the fact that b is empty.

The expression a += b is roughly equivalent to list.extend with an assignment at the end:

  1. Extend the buffer of a by enough elements to hold b. This does not always involve a reallocation, because list growth is amortized to take O(n) time in the long run.
  2. Copy b to the end of a.
  3. Rebind the name a to the same object (which is what list.__iadd__ returns).

Notice that in this case, step is is a reallocation, so elements of a are copied only when the memory moves. Since b is empty in your case, nothing is reallocated or copied at all.

like image 21
Mad Physicist Avatar answered Oct 17 '22 20:10

Mad Physicist