Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting Pandas dataframe into multiple dataframes based on condition in column

To prep my data correctly for a ML task, I need to be able to split my original dataframe into multiple smaller dataframes. I want to get all the rows above and including the row where the value for column 'BOOL' is 1 - for every occurrence of 1. i.e. n dataframes where n is the number of occurences of 1.

A sample of the data:

df = pd.DataFrame({"USER_ID": ['001', '001', '001', '001', '001'],
'VALUE' : [1, 2, 3, 4, 5], "BOOL": [0, 1, 0, 1, 0]})

Expected Output is 2 dataframes as shown:

enter image description here

And:

enter image description here

I have considered a for loop using if-else statements to append rows - but it is highly inefficient for the data-set I am using. Looking for a more pythonic way of doing this.

like image 428
DaytaSigntist Avatar asked Jun 11 '26 21:06

DaytaSigntist


2 Answers

You can use np.split which accepts an array of indices where to split:

np.split(df, *np.where(df.BOOL == 1))

If you want to include the rows with BOOL == 1 to the previous data frame you can just add 1 to all the indices:

np.split(df, np.where(df.BOOL == 1)[0] + 1)
like image 169
a_guest Avatar answered Jun 14 '26 17:06

a_guest


I think using for loop is better here

idx=df.BOOL.nonzero()[0]

d={x : df.iloc[:y+1,:] for x , y in enumerate(idx)}
d[0]
   BOOL USER_ID  VALUE
0     0     001      1
1     1     001      2
like image 31
BENY Avatar answered Jun 14 '26 15:06

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!