I have many (=1000+), large (=1000000+ records) data files with time, x, y, z data.
I used numpy.loadtxt against a sample file, to populate four parallel arrays; e.g.,
ts, xs, ys, zs = numpy.loadtxt( 'sampledatafile.csv', delimiter=',', unpack=True)
I want to select a subset of these parallel arrays, where the time is in a specified range; e.g.,
min_time = t0 # some time, in the same format as values in the data file
max_time = t1 # a later time
I have been able to do this, by iterating through the ts array; like this,
my_ts = []
my_xs = []
my_ys = []
my_zs = []
for row in range( len( ts ) ):
if ( min_time <= ts[row] ) and ( ts[row] <= max_time ):
my_ts.append( ts[row] )
my_xs.append( ss[row] )
my_ys.append( ys[row] )
my_zs.append( zs[row] )
Is there a more efficient way here? I figure another approach is to load each record, using a csv file reader, and checking each record as it goes by, instead of numpy.loadtxt.
By surely there is a more clever way, in Python? Something like, "select all records in the ts array meeting the criteria, and the associated elements in the parallel arrays"? Is there is clever, and cool syntax, for this; especially if it is more efficient than the approach(es) above?
arr = numpy.loadtxt( 'sampledatafile.csv', delimiter=',')
ts = arr[:, 0]
idx = (ts >= min_time) & (ts <= max_time)
my_ts, my_xs, my_ys, my_zs = arr[idx].T
If you would like to sort your array according to ts first, you could also use np.argsort:
idx = np.argsort(ts)
arr = arr[idx]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With