I have a dataframe like this:
file:
| FIRST | LAST | ID |
---------------------------
0 "ABC" 12 35
1 "ABC" 14 35
2 "AB" 15 36
Now, what I want is:
file:
| FIRST | LAST | ID |
---------------------------
0 "ABC" [12,14] 35
2 "AB" 15 36
For this problem let's assume that if ID of two rows is equal then all the values except LAST is also equal.
Therefore, replace all the value except the values of last, which are added to a list.
I tried using solution given in this link: Pandas DataFrame - Combining one column's values with same index into list
I used this:
file = file.groupby('ID')
file = file['Last'].unique()
This is the output I got:
ID
35 [12, 14]
36 [15]
Name: Last, dtype: object
Probably, I am missing something in the groupby().
Thanks in advance :)
UPDATE:
My original Dataframe has more than 100 columns. if ID of two rows is equal then all the values except LAST is also equal.
Given that only the last two rows are different for a given ID, just take the first value when applying a groupby to them. For the column `LAST', use its value or convert it to a list of unique items if there are more than one.
grouping_cols = ['ID', ...]
agg_cols = {col: 'first' for col in df if col not in grouping_cols}
agg_cols['LAST'] = lambda x: x.unique().tolist() if len(x) > 1 else x.iat[0]
>>> df.groupby(grouping_cols, as_index=False).agg(agg_cols)
ID LAST FIRST
0 35 [12, 14] ABC
1 36 15 AB
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With