Adding rows to DataFrame conditioned on values

Question

I have the following Pandas DataFrame:

     start_timestamp_milli  end_timestamp_milli       name  rating
1            1555414708025        1555414723279    Valence       2   
2            1555414708025        1555414723279    Arousal       6   
3            1555414708025        1555414723279  Dominance       2   
4            1555414708025        1555414723279    Sadness       1
5            1555414708025        1555414723279    Happiness     0
6            1555414708025        1555414723279    Anger         0
7            1555414708025        1555414723279    Surprise      0
8            1555414708025        1555414723279    Stress        0
9            1555414813304        1555414831795    Valence       3   
10           1555414813304        1555414831795    Arousal       5   
11           1555414813304        1555414831795  Dominance       2   
12           1555414813304        1555414831795    Sadness       0
13           1555414813304        1555414831795    Happiness     0
14           1555414813304        1555414831795    Anger         0
15           1555414708025        1555414723279    Surprise      0
16           1555414708025        1555414723279    Stress        0   
17           1555414921819        1555414931382    Valence       1   
18           1555414921819        1555414931382    Arousal       7   
19           1555414921819        1555414931382  Dominance       2   
20           1555414921819        1555414931382    Sadness       1 
21           1555414921819        1555414931382   Happiness      0  
22           1555414921819        1555414931382    Anger         1
23           1555414708025        1555414723279    Surprise      0
24           1555414708025        1555414723279    Stress        1

Now, for each block with the same start_timestamp_milli and end_timestamp_milli, I would like to insert an additional row with name "Neutral" and a rating of 1 if the rating of Sadness, Happiness, Anger, Surprise and Stress is 0 and otherwise 0. The start_timestamp_milli and end_timestamp_milli of the new row should be set to the values of that block.

The resulting DataFrame should look like this:

     start_timestamp_milli  end_timestamp_milli       name  rating
1            1555414708025        1555414723279    Valence       2   
2            1555414708025        1555414723279    Arousal       6   
3            1555414708025        1555414723279  Dominance       2   
4            1555414708025        1555414723279    Sadness       1
5            1555414708025        1555414723279    Happiness     0
6            1555414708025        1555414723279    Anger         0
7            1555414708025        1555414723279    Surprise      0
8            1555414708025        1555414723279    Stress        0
9            1555414708025        1555414723279    Neutral       0
10           1555414813304        1555414831795    Valence       3   
11           1555414813304        1555414831795    Arousal       5   
12           1555414813304        1555414831795  Dominance       2   
13           1555414813304        1555414831795    Sadness       0
14           1555414813304        1555414831795    Happiness     0
15           1555414813304        1555414831795    Anger         0
16           1555414708025        1555414723279    Surprise      0
17           1555414708025        1555414723279    Stress        0
18           1555414708025        1555414723279    Neutral       1   
19           1555414921819        1555414931382    Valence       1   
20           1555414921819        1555414931382    Arousal       7   
21           1555414921819        1555414931382  Dominance       2   
22           1555414921819        1555414931382    Sadness       1 
23           1555414921819        1555414931382   Happiness      0  
24           1555414921819        1555414931382    Anger         1
25           1555414708025        1555414723279    Surprise      0
26           1555414708025        1555414723279    Stress        1 
27           1555414708025        1555414723279    Neutral       0

How can this be done?

BENY · Accepted Answer

You can do with filter before groupby agg + all , then concat back the result

s=df.loc[df.name.isin(['Sadness', 'Happiness', 'Anger', 'Surprise' , 'Stress']),'rating'].\
       eq(0).\
            groupby([df['start_timestamp_milli'],df['end_timestamp_milli']]).\
                 agg('all').reset_index().assign(name='Neutral')
df=pd.concat([df,s],sort=False).sort_values(['start_timestamp_milli','end_timestamp_milli'])
df
Out[66]: 
    start_timestamp_milli  end_timestamp_milli       name  rating
1           1555414708025        1555414723279    Valence       2
2           1555414708025        1555414723279    Arousal       6
3           1555414708025        1555414723279  Dominance       2
4           1555414708025        1555414723279    Sadness       1
5           1555414708025        1555414723279  Happiness       0
6           1555414708025        1555414723279      Anger       0
7           1555414708025        1555414723279   Surprise       0
8           1555414708025        1555414723279     Stress       0
15          1555414708025        1555414723279   Surprise       0
16          1555414708025        1555414723279     Stress       0
23          1555414708025        1555414723279   Surprise       0
24          1555414708025        1555414723279     Stress       1
0           1555414708025        1555414723279    Neutral       0
9           1555414813304        1555414831795    Valence       3
10          1555414813304        1555414831795    Arousal       5
11          1555414813304        1555414831795  Dominance       2
12          1555414813304        1555414831795    Sadness       0
13          1555414813304        1555414831795  Happiness       0
14          1555414813304        1555414831795      Anger       0
1           1555414813304        1555414831795    Neutral       1
17          1555414921819        1555414931382    Valence       1
18          1555414921819        1555414931382    Arousal       7
19          1555414921819        1555414931382  Dominance       2
20          1555414921819        1555414931382    Sadness       1
21          1555414921819        1555414931382  Happiness       0
22          1555414921819        1555414931382      Anger       1
2           1555414921819        1555414931382    Neutral       0

user3483203 · Answer

I would pivot first, which makes the lookup a lot easier, and then adjust your initial DataFrame.

f = (df.pivot_table(index=['start_timestamp_milli', 'end_timestamp_milli'],
        columns='name', values='rating', aggfunc='any', fill_value=0).astype(int))

Now to find the timestamp combinations:

cols = ['Sadness', 'Happiness', 'Anger', 'Surprise', 'Stress']
appd = pd.Series(np.where(f[cols].any(1), 0, 1), index=f.index)

res = pd.concat([df, appd.rename('rating').reset_index().assign(name='Neutral')])

    end_timestamp_milli       name  rating  start_timestamp_milli
1         1555414723279    Valence       2          1555414708025
2         1555414723279    Arousal       6          1555414708025
3         1555414723279  Dominance       2          1555414708025
4         1555414723279    Sadness       1          1555414708025
5         1555414723279  Happiness       0          1555414708025
6         1555414723279      Anger       0          1555414708025
7         1555414723279   Surprise       0          1555414708025
8         1555414723279     Stress       0          1555414708025
9         1555414831795    Valence       3          1555414813304
10        1555414831795    Arousal       5          1555414813304
11        1555414831795  Dominance       2          1555414813304
12        1555414831795    Sadness       0          1555414813304
13        1555414831795  Happiness       0          1555414813304
14        1555414831795      Anger       0          1555414813304
15        1555414723279   Surprise       0          1555414708025
16        1555414723279     Stress       0          1555414708025
17        1555414931382    Valence       1          1555414921819
18        1555414931382    Arousal       7          1555414921819
19        1555414931382  Dominance       2          1555414921819
20        1555414931382    Sadness       1          1555414921819
21        1555414931382  Happiness       0          1555414921819
22        1555414931382      Anger       1          1555414921819
23        1555414723279   Surprise       0          1555414708025
24        1555414723279     Stress       1          1555414708025
0         1555414723279    Neutral       0          1555414708025
1         1555414831795    Neutral       1          1555414813304
2         1555414931382    Neutral       0          1555414921819

This will add them to the end of the Frame, you would need to sort if you want them intermittently placed.

Adding rows to DataFrame conditioned on values

Tags:

python

pandas

dataframe

pandas-groupby

machinery

2 Answers

BENY

user3483203

Recent Activity

Donate For Us

Adding rows to DataFrame conditioned on values

Tags:

python

pandas

dataframe

pandas-groupby

machinery

2 Answers

BENY

user3483203

Related questions

Recent Activity

Donate For Us