I have a 120,000*4 numpy array as shown below. Each row is a sample. The first column is time in second, or the index
using Pandas terminology.
0.014 14.175 -29.97 -22.68
0.022 13.905 -29.835 -22.68
0.030 12.257 -29.32 -22.67
... ...
1259.980 -0.405 2.205 3.825
1259.991 -0.495 2.115 3.735
I want to select the rows recorded between 100.000 to 200.000 sec and save it into a new array. If this were a Pandas dataframe, I would simply write df.loc[100:200]
. What is the equivalent operation in numpy?
This is NOT a question of feasibility. I simply wonder if there are any pythonic one-line solutions.
loc. Access a group of rows and columns by label(s) or a boolean array. . loc[] is primarily label based, but may also be used with a boolean array.
pandas. DataFrame. loc[] is a property that is used to access a group of rows and columns by label(s) or a boolean array. Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows.
Comparison between DataFrame and Array Numpy arrays can be multi-dimensional whereas DataFrame can only be two-dimensional. Arrays contain similar types of objects or elements whereas DataFrame can have objects or multiple or similar data types. Both array and DataFrames are mutable.
at is a single element and using . loc maybe a Series or a DataFrame. Returning single value is not the case always. It returns array of values if the provided index is used multiple times.
If you’re familiar with calling methods in Python, this should be very familiar. Essentially, you’re going to use “dot notation” to call loc [] after specifying a Pandas Dataframe. So first, you’ll specify a Pandas DataFrame object.
The Pandas loc method enables you to select data from a Pandas DataFrame by label. It allows you to “ loc ate” data in a DataFrame. That’s where we get the name loc [].
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. It can be thought of as a dict-like container for Series objects. This is the primary data structure of the Pandas.
The loc property gets, or sets, the value (s) of the specified labels. Specify both row and column with a label. To access more than one row, use double brackets and specify the labels, separated by commas: You can also specify a slice of the DataFrame with from and to labels, separated by a colon:
This assumes indexes are sorted:
IIUC,
x=np.array([ [1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]])
x[(x[:,0] >= 5) & (x[:,0] <= 9) ]
So you would have 100 and 200 instead of 5 and 9.
For a more general solution, check Wen`s answer
Data from Raf
x[np.where(x[:,0]==5)[0][0]:np.where(x[:,0]==9)[0][0]+1,:]
Out[341]:
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
Notice
only using greater and less than for that can not fully replace the .loc
, the back end of .loc is index position not value range
For example
df
Out[348]:
0 1 2 3
0 1 2 3 4
1 5 6 7 8
4444 9 10 11 12
3 13 14 15 16
df.loc[1:3]
Out[347]:
0 1 2 3
1 5 6 7 8
4444 9 10 11 12
3 13 14 15 16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With