In NumPy, why does hstack()
copy the data from the arrays being stacked:
A, B = np.array([1,2]), np.array([3,4])
C = np.hstack((A,B))
A[0]=99
gives for C
:
array([1, 2, 3, 4])
whereas hsplit()
creates a view on the data:
a = np.array(((1,2),(3,4)))
b, c = np.hsplit(a,2)
a[0][0]=99
gives for b
:
array([[99],
[ 3]])
I mean - what is the reasoning behind the implementation of this behaviour (which I find inconsistent and hard to remember): I accept that this happens because it's coded that way...
Stack arrays in sequence horizontally (column wise). This is equivalent to concatenation along the second axis, except for 1-D arrays where it concatenates along the first axis.
hstack() function is used to stack the sequence of input arrays horizontally (i.e. column wise) to make a single array. Parameters : tup : [sequence of ndarrays] Tuple containing arrays to be stacked.
hstack combines NumPy arrays horizontally and np. vstack combines arrays vertically.
concatenate() function to concat, merge, or join a sequence of two or multiple arrays into a single NumPy array. Concatenation refers to putting the contents of two or more arrays in a single array. In Python NumPy, we can join arrays by axes (vertical or horizontal), whereas in SQL we join tables based on keys.
Basically the underlying ndarray data structure only has a single pointer to the start of its data's memory and then stride information about how to move through each dimension. If you concatenate two arrays, it won't know how to move from one memory location to the other. On the other hand, if you split an array into two arrays, each can easily store a pointer to the first element (which is somewhere inside the original array).
The basic C implementation is here, and there is a good discussion at:
http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#life-of-ndarray
NumPy generally tries to create views whenever possible, since memory copies are inefficient and can quite quickly eat up a lot of cycles.
hsplit
splits the input array into multiple output arrays. The output arrays can each be views into a portion of the original parent array (since they are basically simple slices). Thus, for efficiency, NumPy creates views, instead of copies.
hstack
combines two completely separate arrays into a single output array. The underlying array implementation cannot handle two separate data sources in a single array, so there is no way to share the data with the original. Thus, NumPy is forced to create a copy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With