I'm trying to make a program that finds consecutive rows that meet some conditions. For example, if there's a dataframe that looks like this:
df = pd.DataFrame([1,1,2,-13,-4,-5,6,17,8,9,-10,-11,-12,-13,14,15],
index=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],
columns=['value'])
>>> df
value
0 1
1 1
2 2
3 -13
4 -4
5 -5
6 6
7 17
8 8
9 9
10 -10
11 -11
12 -12
13 -13
14 -14
15 15
I want it to return a dataframe that shows rows that meet the conditions below:
1) the order has to be (positive rows)
and (negative rows)
, not the other way around.
2) each positive or negative group of rows has to have at least 3 rows
3) positive and negatives groups have to be adjacent to each other
posIdx, negIdx, posLength, negLength
0 2 3 3 3 # (1,1,2) (-13,-4,-5)
1 9 10 4 5 # (6,17,8,9) (-10,-11,-12,-13,-14)
Are there any simple ways to do this using python or pandas commands?
I create helper columns for easy verify solution:
#column for negative and positive
df['sign'] = np.where(df['value'] < 0, 'neg','pos')
#consecutive groups
df['g'] = df['sign'].ne(df['sign'].shift()).cumsum()
#removed groups with length more like 2
df = df[df['g'].map(df['g'].value_counts()).gt(2)]
#tested if order `pos-neg` of groups, if not removed groups
m1 = df['sign'].eq('pos') & df['sign'].shift(-1).eq('neg')
m2 = df['sign'].eq('neg') & df['sign'].shift().eq('pos')
groups = df.loc[m1 | m2, 'g']
df = df[df['g'].isin(groups)].copy()
df['pairs'] = (df['sign'].ne(df['sign'].shift()) & df['sign'].eq('pos')).cumsum()
print (df)
value sign g pairs
0 1 pos 1 1
1 1 pos 1 1
2 2 pos 1 1
3 -13 neg 2 1
4 -4 neg 2 1
5 -5 neg 2 1
6 6 pos 3 2
7 17 pos 3 2
8 8 pos 3 2
9 9 pos 3 2
10 -10 neg 4 2
11 -11 neg 4 2
12 -12 neg 4 2
13 -13 neg 4 2
Last aggregate GroupBy.first
for all groups and counts by GroupBy.size
and named aggregation (pandas 0.25+), sorting columns and flatten MultiIndex, last correct Idx_pos
for subtract 1
:
df1 = (df.reset_index()
.groupby(['pairs','g', 'sign'])
.agg(Idx=('index','first'), Length=('sign','size'))
.reset_index(level=1, drop=True)
.unstack()
.sort_index(axis=1, level=[0,1], ascending=[True, False])
)
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')
df1['Idx_pos'] = df1['Idx_neg'] - 1
print (df1)
Idx_pos Idx_neg Length_pos Length_neg
pairs
1 2 3 3 3
2 9 10 4 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With