Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

selecting from parallel arrays

I have many (=1000+), large (=1000000+ records) data files with time, x, y, z data.

I used numpy.loadtxt against a sample file, to populate four parallel arrays; e.g.,

ts, xs, ys, zs = numpy.loadtxt( 'sampledatafile.csv', delimiter=',', unpack=True)

I want to select a subset of these parallel arrays, where the time is in a specified range; e.g.,

min_time = t0  # some time, in the same format as values in the data file
max_time = t1  # a later time

I have been able to do this, by iterating through the ts array; like this,

my_ts = []
my_xs = []
my_ys = []
my_zs = []

for row in range( len( ts ) ):
    if ( min_time <= ts[row] ) and ( ts[row] <= max_time ):
        my_ts.append( ts[row] )
        my_xs.append( ss[row] )
        my_ys.append( ys[row] )
        my_zs.append( zs[row] )

Is there a more efficient way here? I figure another approach is to load each record, using a csv file reader, and checking each record as it goes by, instead of numpy.loadtxt.

By surely there is a more clever way, in Python? Something like, "select all records in the ts array meeting the criteria, and the associated elements in the parallel arrays"? Is there is clever, and cool syntax, for this; especially if it is more efficient than the approach(es) above?

like image 765
Bruce Simonson Avatar asked Jun 25 '26 00:06

Bruce Simonson


1 Answers

arr = numpy.loadtxt( 'sampledatafile.csv', delimiter=',')
ts = arr[:, 0]
idx = (ts >= min_time) & (ts <= max_time)
my_ts, my_xs, my_ys, my_zs = arr[idx].T

If you would like to sort your array according to ts first, you could also use np.argsort:

idx = np.argsort(ts)
arr = arr[idx]
like image 192
unutbu Avatar answered Jun 26 '26 13:06

unutbu



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!