Let's say I have this two snippet of code in python :
1 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x = x + np.array([1,1,1,1])
print y
2 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x += np.array([1,1,1,1])
print y
I thought the result of y
will be the same in both examples since y
point out to x
and x
become (2,3,4,5)
, BUT it wasn't
The results were (1,2,3,4) for 1
and (2,3,4,5) for 2
.
After some research I find out that in first example
#-First example---------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4]
y = x # made y point to x
# unril now we have x --> [1,2,3,4]
# |
# y
x = x + np.array([1,1,1,1])
# however this operation **create a new array** [2,3,4,5]
# and made x point to it instead of the first one
# so we have y --> [1,2,3,4] and x --> [2,3,4,5]
#-Second example--------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4]
y = x # made y point to x
# unril now the same x --> [1,2,3,4]
# |
# y
x += np.array([1,1,1,1])
# this operation **Modify the existing array**
# so the result will be
# unril now the same x --> [2,3,4,5]
# |
# y
You can find out more about this behaviors (not only for this example) in this link In-place algorithm
My question is : Being aware of this behavior why should I use in-place algorithm in term of performance? (time of excution faster? less memory alocation?..)
EDIT : Clarification
The example of (+, =+) was just to explain simply the in-place algorithm to the one who don't know.. but the question was in general the use of in-place algorithm not only in this case..
As another more complex example: loading a CSV file (just 10 Million rows) in a variable then sorting the result, is the idea of in-place algorithm is to produce an output in the same memory space that contains the input by successively transforming that data until the output is produced? - This avoids the need to use twice the storage - one area for the input and an equal-sized area for the output ( Using the minimum amount of RAM, hard disk ... )
It seems that you understand the semantical difference between x += 1
and x = x + 1
.
For benchmarking, you can use timeit in IPython.
After defining those functions:
import numpy as np
def in_place(n):
x = np.arange(n)
x += 1
def not_in_place(n):
x = np.arange(n)
x = x + 1
def in_place_no_broadcast(n):
x = np.arange(n)
x += np.ones(n, dtype=np.int)
You can simply use the %timeit
syntax to compare performances:
%timeit in_place(10**7)
20.3 ms ± 81.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit not_in_place(10**7)
30.4 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit in_place_no_broadcast(10**7)
35.4 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
not_in_place
is 50% slower than in_place
.
Note that broadcasting also makes a huge difference : numpy understands x += 1
as adding a 1
to every single element of x
, without having to create yet another array.
in_place
should be the preferred function: it's faster and uses less memory. You might run into bugs if you use and mutate this object at different places in your code, though. The typical example would be :
x = np.arange(5)
y = [x, x]
y[0][0] = 10
y
# [array([10, 1, 2, 3, 4]), array([10, 1, 2, 3, 4])]
Your understanding of the advantages of in-place sorting is correct. It can make a huge difference in memory requirements when sorting large data sets.
There are other desirable features for a sorting algorithm (stable, acceptable worst-case complexity, ...) and it looks like the standard Python algorithm (Timsort) has many of them.
Timsort is an hybrid algorithm. Some parts of it are in-place and some require extra memory. It will never use more than n/2
though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With