Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to stack numpy arrays in a loop?

Tags:

python

numpy

I have a code that generates me within a for loop two numpy arrays (data_transform). In the first loop generates a numpy array of (40, 2) and in the second loop one of (175, 2). I want to concatenate these two arrays into one, to give me an array of (215, 2). I tried with np.concatenate and with np.append, but it gives me an error since the arrays must be the same size. Here is an example of how I am doing the code:

result_arr = np.array([])

for label in labels_set:
    data = [index for index, value in enumerate(labels_list) if value == label]
    for i in data:
        sub_corpus.append(corpus[i])
    data_sub_tfidf = vec.fit_transform(sub_corpus) 
    data_transform = pca.fit_transform(data_sub_tfidf) 
    #Append array
    sub_corpus = []

I have also used np.row_stack but nothing else gives me a value of (175, 2) which is the second array I want to concatenate.

like image 869
Luis Miguel Avatar asked Sep 24 '19 15:09

Luis Miguel


2 Answers

What @hpaulj was trying to say with

Stick with list append when doing loops.

is

#use a normal list
result_arr = []

for label in labels_set:

    data_transform = pca.fit_transform(data_sub_tfidf) 

    # append the data_transform object to that list
    # Note: this is not np.append(), which is slow here
    result_arr.append(data_transform)

# and stack it after the loop
# This prevents slow memory allocation in the loop. 
# So only one large chunk of memory is allocated since
# the final size of the concatenated array is known.

result_arr = np.concatenate(result_arr)

# or 
result_arr = np.stack(result_arr, axis=0)

# or
result_arr = np.vstack(result_arr)

Your arrays don't really have different dimensions. They have one different dimension, the other one is identical. And in that case you can always stack along the "different" dimension.

like image 54
Joe Avatar answered Nov 14 '22 23:11

Joe


Using concatenate, initializing "c":

a = np.array([[8,3,1],[2,5,1],[6,5,2]])
b = np.array([[2,5,1],[2,5,2]])
matrix = [a,b]

c = np.empty([0,matrix[0].shape[1]])

for v in matrix:
    c = np.append(c, v, axis=0)

Output:

[[8. 3. 1.]
 [2. 5. 1.]
 [6. 5. 2.]
 [2. 5. 1.]
 [2. 5. 2.]]
like image 44
Manuel Avatar answered Nov 14 '22 23:11

Manuel