I have a pandas data frame with following format:
year col1
y1 val_1
y1 val_2
y1 val_3
y2 val_4
y2 val_5
y2 val_6
y3 val_7
y3 val_8
y3 val_9
How do I select only the values till year 2 and omit year 3?
I need a new_data frame as follows:
year col1
y1 val_1
y1 val_2
y1 val_3
y2 val_4
y2 val_5
y2 val_6
y1, y2, y3
represent year values
On your sample dataset the following works:
In [35]:
df.iloc[0:df[df.year == 'y3'].index[0]]
Out[35]:
year col1
0 y1 val_1
1 y1 val_2
2 y1 val_3
3 y2 val_4
4 y2 val_5
5 y2 val_6
So breaking this down, we perform a boolean index to find the rows that equal the year value:
In [36]:
df[df.year == 'y3']
Out[36]:
year col1
6 y3 val_7
7 y3 val_8
8 y3 val_9
but we are interested in the index so we can use this for slicing:
In [37]:
df[df.year == 'y3'].index
Out[37]:
Int64Index([6, 7, 8], dtype='int64')
But we only need the first value for slicing hence the call to index[0]
, however if you df is already sorted by year value then just performing df[df.year < y3]
would be simpler and work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With