I have the following Pandas DataFrame:
start_timestamp_milli end_timestamp_milli name rating
1 1555414708025 1555414723279 Valence 2
2 1555414708025 1555414723279 Arousal 6
3 1555414708025 1555414723279 Dominance 2
4 1555414708025 1555414723279 Sadness 1
5 1555414708025 1555414723279 Happiness 0
6 1555414708025 1555414723279 Anger 0
7 1555414708025 1555414723279 Surprise 0
8 1555414708025 1555414723279 Stress 0
9 1555414813304 1555414831795 Valence 3
10 1555414813304 1555414831795 Arousal 5
11 1555414813304 1555414831795 Dominance 2
12 1555414813304 1555414831795 Sadness 0
13 1555414813304 1555414831795 Happiness 0
14 1555414813304 1555414831795 Anger 0
15 1555414708025 1555414723279 Surprise 0
16 1555414708025 1555414723279 Stress 0
17 1555414921819 1555414931382 Valence 1
18 1555414921819 1555414931382 Arousal 7
19 1555414921819 1555414931382 Dominance 2
20 1555414921819 1555414931382 Sadness 1
21 1555414921819 1555414931382 Happiness 0
22 1555414921819 1555414931382 Anger 1
23 1555414708025 1555414723279 Surprise 0
24 1555414708025 1555414723279 Stress 1
Now, for each block with the same start_timestamp_milli and end_timestamp_milli, I would like to insert an additional row with name "Neutral" and a rating of 1 if the rating of Sadness, Happiness, Anger, Surprise and Stress is 0 and otherwise 0. The start_timestamp_milli and end_timestamp_milli of the new row should be set to the values of that block.
The resulting DataFrame should look like this:
start_timestamp_milli end_timestamp_milli name rating
1 1555414708025 1555414723279 Valence 2
2 1555414708025 1555414723279 Arousal 6
3 1555414708025 1555414723279 Dominance 2
4 1555414708025 1555414723279 Sadness 1
5 1555414708025 1555414723279 Happiness 0
6 1555414708025 1555414723279 Anger 0
7 1555414708025 1555414723279 Surprise 0
8 1555414708025 1555414723279 Stress 0
9 1555414708025 1555414723279 Neutral 0
10 1555414813304 1555414831795 Valence 3
11 1555414813304 1555414831795 Arousal 5
12 1555414813304 1555414831795 Dominance 2
13 1555414813304 1555414831795 Sadness 0
14 1555414813304 1555414831795 Happiness 0
15 1555414813304 1555414831795 Anger 0
16 1555414708025 1555414723279 Surprise 0
17 1555414708025 1555414723279 Stress 0
18 1555414708025 1555414723279 Neutral 1
19 1555414921819 1555414931382 Valence 1
20 1555414921819 1555414931382 Arousal 7
21 1555414921819 1555414931382 Dominance 2
22 1555414921819 1555414931382 Sadness 1
23 1555414921819 1555414931382 Happiness 0
24 1555414921819 1555414931382 Anger 1
25 1555414708025 1555414723279 Surprise 0
26 1555414708025 1555414723279 Stress 1
27 1555414708025 1555414723279 Neutral 0
How can this be done?
You can do with filter before groupby agg + all , then concat back the result
s=df.loc[df.name.isin(['Sadness', 'Happiness', 'Anger', 'Surprise' , 'Stress']),'rating'].\
eq(0).\
groupby([df['start_timestamp_milli'],df['end_timestamp_milli']]).\
agg('all').reset_index().assign(name='Neutral')
df=pd.concat([df,s],sort=False).sort_values(['start_timestamp_milli','end_timestamp_milli'])
df
Out[66]:
start_timestamp_milli end_timestamp_milli name rating
1 1555414708025 1555414723279 Valence 2
2 1555414708025 1555414723279 Arousal 6
3 1555414708025 1555414723279 Dominance 2
4 1555414708025 1555414723279 Sadness 1
5 1555414708025 1555414723279 Happiness 0
6 1555414708025 1555414723279 Anger 0
7 1555414708025 1555414723279 Surprise 0
8 1555414708025 1555414723279 Stress 0
15 1555414708025 1555414723279 Surprise 0
16 1555414708025 1555414723279 Stress 0
23 1555414708025 1555414723279 Surprise 0
24 1555414708025 1555414723279 Stress 1
0 1555414708025 1555414723279 Neutral 0
9 1555414813304 1555414831795 Valence 3
10 1555414813304 1555414831795 Arousal 5
11 1555414813304 1555414831795 Dominance 2
12 1555414813304 1555414831795 Sadness 0
13 1555414813304 1555414831795 Happiness 0
14 1555414813304 1555414831795 Anger 0
1 1555414813304 1555414831795 Neutral 1
17 1555414921819 1555414931382 Valence 1
18 1555414921819 1555414931382 Arousal 7
19 1555414921819 1555414931382 Dominance 2
20 1555414921819 1555414931382 Sadness 1
21 1555414921819 1555414931382 Happiness 0
22 1555414921819 1555414931382 Anger 1
2 1555414921819 1555414931382 Neutral 0
I would pivot first, which makes the lookup a lot easier, and then adjust your initial DataFrame.
f = (df.pivot_table(index=['start_timestamp_milli', 'end_timestamp_milli'],
columns='name', values='rating', aggfunc='any', fill_value=0).astype(int))
Now to find the timestamp combinations:
cols = ['Sadness', 'Happiness', 'Anger', 'Surprise', 'Stress']
appd = pd.Series(np.where(f[cols].any(1), 0, 1), index=f.index)
res = pd.concat([df, appd.rename('rating').reset_index().assign(name='Neutral')])
end_timestamp_milli name rating start_timestamp_milli
1 1555414723279 Valence 2 1555414708025
2 1555414723279 Arousal 6 1555414708025
3 1555414723279 Dominance 2 1555414708025
4 1555414723279 Sadness 1 1555414708025
5 1555414723279 Happiness 0 1555414708025
6 1555414723279 Anger 0 1555414708025
7 1555414723279 Surprise 0 1555414708025
8 1555414723279 Stress 0 1555414708025
9 1555414831795 Valence 3 1555414813304
10 1555414831795 Arousal 5 1555414813304
11 1555414831795 Dominance 2 1555414813304
12 1555414831795 Sadness 0 1555414813304
13 1555414831795 Happiness 0 1555414813304
14 1555414831795 Anger 0 1555414813304
15 1555414723279 Surprise 0 1555414708025
16 1555414723279 Stress 0 1555414708025
17 1555414931382 Valence 1 1555414921819
18 1555414931382 Arousal 7 1555414921819
19 1555414931382 Dominance 2 1555414921819
20 1555414931382 Sadness 1 1555414921819
21 1555414931382 Happiness 0 1555414921819
22 1555414931382 Anger 1 1555414921819
23 1555414723279 Surprise 0 1555414708025
24 1555414723279 Stress 1 1555414708025
0 1555414723279 Neutral 0 1555414708025
1 1555414831795 Neutral 1 1555414813304
2 1555414931382 Neutral 0 1555414921819
This will add them to the end of the Frame, you would need to sort if you want them intermittently placed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With