I have a data frame like this,
df
col1 col2 col3
1 2 3
2 5 6
7 8 9
10 11 12
11 12 13
13 14 15
14 15 16
Now I want to create multiple data frames from above when the col1 difference of two consecutive rows are more than 1. So the result data frames will look like,
df1
col1 col2 col3
1 2 3
2 5 6
df2
col1 col2 col3
7 8 9
df3
col1 col2 col3
10 11 12
11 12 13
df4
col1 col2 col3
13 14 15
14 15 16
I can do this using for loop and storing the indices but this will increase execution time, looking for some pandas shortcuts or pythonic way to do this most efficiently.
You could define a custom grouper by taking the diff, checking when it is greater than 1, and take the cumsum of the boolean series. Then group by the result and build a dictionary from the groupby object:
d = dict(tuple(df.groupby(df.col1.diff().gt(1).cumsum())))
print(d[0])
col1 col2 col3
0 1 2 3
1 2 5 6
print(d[1])
col1 col2 col3
2 7 8 9
A more detailed break-down:
df.assign(difference=(diff:=df.col1.diff()),
condition=(gt1:=diff.gt(1)),
grouper=gt1.cumsum())
col1 col2 col3 difference condition grouper
0 1 2 3 NaN False 0
1 2 5 6 1.0 False 0
2 7 8 9 5.0 True 1
3 10 11 12 3.0 True 2
4 11 12 13 1.0 False 2
5 13 14 15 2.0 True 3
6 14 15 16 1.0 False 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With