I have a pandas data frame with following format:
year    col1 
y1      val_1 
y1      val_2
y1      val_3
y2      val_4
y2      val_5
y2      val_6
y3      val_7
y3      val_8
y3      val_9
How do I select only the values till year 2 and omit year 3?
I need a new_data frame as follows:
   year      col1 
    y1      val_1 
    y1      val_2
    y1      val_3
    y2      val_4
    y2      val_5
    y2      val_6
y1, y2, y3 represent year values 
On your sample dataset the following works:
In [35]:
df.iloc[0:df[df.year == 'y3'].index[0]]
Out[35]:
  year   col1
0   y1  val_1
1   y1  val_2
2   y1  val_3
3   y2  val_4
4   y2  val_5
5   y2  val_6
So breaking this down, we perform a boolean index to find the rows that equal the year value:
In [36]:
df[df.year == 'y3']
Out[36]:
  year   col1
6   y3  val_7
7   y3  val_8
8   y3  val_9
but we are interested in the index so we can use this for slicing:
In [37]:
df[df.year == 'y3'].index
Out[37]:
Int64Index([6, 7, 8], dtype='int64')
But we only need the first value for slicing hence the call to index[0], however if you df is already sorted by year value then just performing df[df.year < y3] would be simpler and work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With