Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does hstack() copy data but hsplit() create a view on it?

Tags:

python

numpy

In NumPy, why does hstack() copy the data from the arrays being stacked:

A, B = np.array([1,2]), np.array([3,4])
C = np.hstack((A,B))
A[0]=99

gives for C:

array([1, 2, 3, 4])

whereas hsplit() creates a view on the data:

a = np.array(((1,2),(3,4)))
b, c = np.hsplit(a,2)
a[0][0]=99

gives for b:

array([[99],
       [ 3]])

I mean - what is the reasoning behind the implementation of this behaviour (which I find inconsistent and hard to remember): I accept that this happens because it's coded that way...

like image 526
xnx Avatar asked Mar 24 '14 17:03

xnx


People also ask

What is the difference between Hstack and concatenate?

Stack arrays in sequence horizontally (column wise). This is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis.

What does NP Hstack do in Python?

hstack() function is used to stack the sequence of input arrays horizontally (i.e. column wise) to make a single array. Parameters : tup : [sequence of ndarrays] Tuple containing arrays to be stacked.

What is Hstack and Vstack in Python?

hstack combines NumPy arrays horizontally and np. vstack combines arrays vertically.

How do I combine two arrays horizontally?

concatenate() function to concat, merge, or join a sequence of two or multiple arrays into a single NumPy array. Concatenation refers to putting the contents of two or more arrays in a single array. In Python NumPy, we can join arrays by axes (vertical or horizontal), whereas in SQL we join tables based on keys.


2 Answers

Basically the underlying ndarray data structure only has a single pointer to the start of its data's memory and then stride information about how to move through each dimension. If you concatenate two arrays, it won't know how to move from one memory location to the other. On the other hand, if you split an array into two arrays, each can easily store a pointer to the first element (which is somewhere inside the original array).

The basic C implementation is here, and there is a good discussion at:

http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#life-of-ndarray

like image 117
JoshAdel Avatar answered Sep 28 '22 19:09

JoshAdel


NumPy generally tries to create views whenever possible, since memory copies are inefficient and can quite quickly eat up a lot of cycles.

hsplit splits the input array into multiple output arrays. The output arrays can each be views into a portion of the original parent array (since they are basically simple slices). Thus, for efficiency, NumPy creates views, instead of copies.

hstack combines two completely separate arrays into a single output array. The underlying array implementation cannot handle two separate data sources in a single array, so there is no way to share the data with the original. Thus, NumPy is forced to create a copy.

like image 38
nneonneo Avatar answered Sep 28 '22 18:09

nneonneo