selecting from parallel arrays

Question

I have many (=1000+), large (=1000000+ records) data files with time, x, y, z data.

I used numpy.loadtxt against a sample file, to populate four parallel arrays; e.g.,

ts, xs, ys, zs = numpy.loadtxt( 'sampledatafile.csv', delimiter=',', unpack=True)

I want to select a subset of these parallel arrays, where the time is in a specified range; e.g.,

min_time = t0  # some time, in the same format as values in the data file
max_time = t1  # a later time

I have been able to do this, by iterating through the ts array; like this,

my_ts = []
my_xs = []
my_ys = []
my_zs = []

for row in range( len( ts ) ):
    if ( min_time <= ts[row] ) and ( ts[row] <= max_time ):
        my_ts.append( ts[row] )
        my_xs.append( ss[row] )
        my_ys.append( ys[row] )
        my_zs.append( zs[row] )

Is there a more efficient way here? I figure another approach is to load each record, using a csv file reader, and checking each record as it goes by, instead of numpy.loadtxt.

By surely there is a more clever way, in Python? Something like, "select all records in the ts array meeting the criteria, and the associated elements in the parallel arrays"? Is there is clever, and cool syntax, for this; especially if it is more efficient than the approach(es) above?

unutbu · Accepted Answer

arr = numpy.loadtxt( 'sampledatafile.csv', delimiter=',')
ts = arr[:, 0]
idx = (ts >= min_time) & (ts <= max_time)
my_ts, my_xs, my_ys, my_zs = arr[idx].T

If you would like to sort your array according to ts first, you could also use np.argsort:

idx = np.argsort(ts)
arr = arr[idx]

selecting from parallel arrays

Tags:

python

arrays

parallel-processing

numpy

selection

Bruce Simonson

1 Answers

unutbu

Recent Activity

Donate For Us

selecting from parallel arrays

Tags:

python

arrays

parallel-processing

numpy

selection

Bruce Simonson

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us