I am very new to the python pandas module.
Suppose I have a data frame or table as follows:
df = pd.DataFrame({
'Column A': [12,12,12, 15, 16, 141, 141, 141, 141],
'Column B':['Apple' ,'Apple' ,'Apple' , 'Red', 'Blue', 'Yellow', 'Yellow', 'Yellow', 'Yellow'],
'Column C':[100, 50, np.nan , 23 , np.nan , 199 , np.nan , 1,np.nan]
})
or I have a data table as follows:
| Column A | Column B |Column C
----| -------- | ---------|--------
0 | 12 | Apple |100
1 | 12 | Apple |50
2 | 12 | Apple |NaN
3 | 15 | Red |23
4 | 16 | Blue |NaN
5 | 141 | Yellow |199
6 | 141 | Yellow |NaN
7 | 141 | Yellow |1
8 | 141 | Yellow |NaN
If values in column A are repeated then add the corresponding values in Column C and paste the sum in new column D (For example, there are 3 rows for 12, thus we should add corresponding values 100 + 50 + NaN, and the sum result 150 should be stored in new column D).
If values in column A do not repeat directly paste Column C values in new column D (such as row 3) but for NaN, it should be 0 (such as row 4).
Could you please help me to get an output like this in python jupyter notebook:
| Column A | Column B |Column C |Column D
----- | -------- | ---------|---------|---------
0 | 12 | Apple |100 |150
1 | 15 | Red |23 |23
2 | 16 | Blue |NaN |0
3 | 141 | Yellow |199 |200
df.groupby("Column A", as_index=False).agg(B=("Column B", "first"),
C=("Column C", "first"),
D=("Column C", "sum"))
# Column A B C D
# 0 12 Apple 100.0 150.0
# 1 15 Red 23.0 23.0
# 2 16 Blue NaN 0.0
# 3 141 Yellow 199.0 200.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With