Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining NumPy arrays

I have two 20x100x3 NumPy arrays which I want to combine into a 40 x 100 x 3 array, that is, just add more lines to the array. I am confused by which function I want: is it vstack, hstack, column_stack or maybe something else?

like image 654
Double AA Avatar asked Jul 18 '11 22:07

Double AA


4 Answers

Might be worth mentioning that

    np.concatenate((a1, a2, ...), axis=0) 

is the general form and vstack and hstack are specific cases. I find it easiest to just know which dimension I want to stack over and provide that as the argument to np.concatenate.

like image 63
Ben Racine Avatar answered Oct 29 '22 16:10

Ben Racine


I believe it's vstack you want

p=array_2
q=array_2
p=numpy.vstack([p,q])
like image 29
Giltech Avatar answered Oct 29 '22 15:10

Giltech


One of the best ways of learning is experimenting, but I would say you want np.vstack although there are other ways of doing the same thing:

a = np.ones((20,100,3))
b = np.vstack((a,a)) 

print b.shape # (40,100,3)

or

b = np.concatenate((a,a),axis=0)

EDIT

Just as a note, on my machine for the sized arrays in the OP's question, I find that np.concatenate is about 2x faster than np.vstack

In [172]: a = np.random.normal(size=(20,100,3))

In [173]: c = np.random.normal(size=(20,100,3))

In [174]: %timeit b = np.concatenate((a,c),axis=0)
100000 loops, best of 3: 13.3 us per loop

In [175]: %timeit b = np.vstack((a,c))
10000 loops, best of 3: 26.1 us per loop
like image 26
JoshAdel Avatar answered Oct 29 '22 15:10

JoshAdel


I tried a little benchmark between r_ and vstack and the result is very interesting:

import numpy as np

NCOLS = 10
NROWS = 2
NMATRICES = 10000

def mergeR(matrices):
    result = np.zeros([0, NCOLS])

    for m in matrices:
        result = np.r_[ result, m]

def mergeVstack(matrices):
    result = np.vstack(matrices)

def main():
    matrices = tuple( np.random.random([NROWS, NCOLS]) for i in xrange(NMATRICES) )
    mergeR(matrices)
    mergeVstack(matrices)

    return 0

if __name__ == '__main__':
    main()

Then I ran profiler:

python -m cProfile -s cumulative np_merge_benchmark.py

and the results:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
...
     1    0.579    0.579    4.139    4.139 np_merge_benchmark.py:21(mergeR)
...
     1    0.000    0.000    0.054    0.054 np_merge_benchmark.py:27(mergeVstack)

So the vstack way is 77x faster!

like image 42
Michel Samia Avatar answered Oct 29 '22 16:10

Michel Samia