Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to efficiently expanding arrays in python?

My question is how to efficiently expand an array, by copying itself many times. I am trying to expand my survey samples to the full size dataset, by copying every sample N times. N is the influence factor that signed to the sample. So I wrote two loops to do this task (script pasted below). It works, but is slow. My sample size is 20,000, and try to expand it into 3 million full size.. is there any function I can try? Thank you for your help!

----My script----

lines = np.asarray(person.read().split('\n'))
df_array = np.asarray(lines[0].split(' '))
for j in range(1,len(lines)-1):
    subarray = np.asarray(lines[j].split(' '))
    factor = int(round(float(subarray[-1]),0))
    for i in range(1,factor):
        df_array = np.vstack((df_array, subarray))
print len(df_array)
like image 304
Angela Y Avatar asked Sep 26 '22 21:09

Angela Y


2 Answers

First, you can try to load data all together with numpy.loadtxt.

Then, to repeat according to the last column, use numpy.repeat:

>>> data = np.array([[1, 2, 3],
...                  [4, 5, 6]])
>>> np.repeat(data, data[:,-1], axis=0)
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6]])

Finally, if you need to round data[:,-1], replace it with np.round(data[:,-1]).astype(int).

like image 147
eph Avatar answered Oct 19 '22 04:10

eph


Stacking numpy arrays over and over is not very efficient, because they're not really optimized for dynamic growth like that. Every time you vstack, it's allocating a whole new chunk of memory for the size of your data at that point.

Use lists then build your array right at the end, maybe something with a generator like this:

def upsample(stream):
    for line in stream:
        rec = line.strip().split()
        factor = int(round(float(rec[-1]),0))
        for i in xrange(factor):
            yield rec

df_array = np.array(list(upsample(person)))
like image 1
fivetentaylor Avatar answered Oct 19 '22 02:10

fivetentaylor