Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging a list of numpy arrays into one array (fast)

Tags:

what would be the fastest way to merge a list of numpy arrays into one array if one knows the length of the list and the size of the arrays, which is the same for all?

I tried two approaches:

  • merged_array = array(list_of_arrays) from Pythonic way to create a numpy array from a list of numpy arrays and

  • vstack

A you can see vstack is faster, but for some reason the first run takes three times longer than the second. I assume this caused by (missing) preallocation. So how would I preallocate an array for vstack? Or do you know a faster methode?

Thanks!

[UPDATE]

I want (25280, 320) not (80, 320, 320) which means, merged_array = array(list_of_arrays) wont work for me. Thanks Joris for pointing that out!!!

Output:

0.547468900681 s merged_array = array(first_list_of_arrays) 0.547191858292 s merged_array = array(second_list_of_arrays) 0.656183958054 s vstack first 0.236850976944 s vstack second 

Code:

import numpy import time width = 320 height = 320 n_matrices=80  secondmatrices = list() for i in range(n_matrices):     temp = numpy.random.rand(height, width).astype(numpy.float32)     secondmatrices.append(numpy.round(temp*9))  firstmatrices = list() for i in range(n_matrices):     temp = numpy.random.rand(height, width).astype(numpy.float32)     firstmatrices.append(numpy.round(temp*9))   t1 = time.time() first1=numpy.array(firstmatrices) print time.time() - t1, "s merged_array = array(first_list_of_arrays)"  t1 = time.time() second1=numpy.array(secondmatrices) print time.time() - t1, "s merged_array = array(second_list_of_arrays)"  t1 = time.time() first2 = firstmatrices.pop() for i in range(len(firstmatrices)):     first2 = numpy.vstack((firstmatrices.pop(),first2)) print time.time() - t1, "s vstack first"  t1 = time.time() second2 = secondmatrices.pop() for i in range(len(secondmatrices)):     second2 = numpy.vstack((secondmatrices.pop(),second2))  print time.time() - t1, "s vstack second" 
like image 686
Framester Avatar asked May 17 '11 12:05

Framester


1 Answers

You have 80 arrays 320x320? So you probably want to use dstack:

first3 = numpy.dstack(firstmatrices) 

This returns one 80x320x320 array just like numpy.array(firstmatrices) does:

timeit numpy.dstack(firstmatrices) 10 loops, best of 3: 47.1 ms per loop   timeit numpy.array(firstmatrices) 1 loops, best of 3: 750 ms per loop 

If you want to use vstack, it will return a 25600x320 array:

timeit numpy.vstack(firstmatrices) 100 loops, best of 3: 18.2 ms per loop 
like image 158
eumiro Avatar answered Oct 09 '22 05:10

eumiro