I'm trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, but it seems to be slanted towards performing some numeric function on the group - I just want to keep the text. Here's what I've got...
I have a dataframe of the form:
index ID apples pears oranges
0 101 oranges
1 134 apples
2 576 pears
3 837 apples
4 576 oranges
5 134 pears
The columns are clean: so the apples column will only ever have the text "apples" in it, or it will be blank".
Where there are multiple entries under the same ID (in this example, on IDs 134 & 576), I want to collapse the rows together to get this:
index ID apples pears oranges
0 101 oranges
1 134 apples pears
2 576 pears oranges
3 837 apples
I could do this by iterating over the rows, but it seems like a non-pandas solution. Is there a better way?
Use drop() to remove first N rows of pandas dataframe To make sure that it removes the rows only, use argument axis=0 and to make changes in place i.e. in calling dataframe object, pass argument inplace=True.
Use the pandas. DataFrame. dropna() method to drop the rows with infinite values.
You can use groupby
with aggregation ''.join
, sum
or max
:
#if blank values are NaN first replace to ''
df = df.fillna('')
df = df.groupby('ID').agg(''.join)
print (df)
apples pears oranges
ID
101 oranges
134 apples pears
576 pears oranges
837 apples
Also works:
df = df.fillna('')
df = df.groupby('ID').sum()
#alternatively max
#df = df.groupby('ID').max()
print (df)
apples pears oranges
ID
101 oranges
134 apples pears
576 pears oranges
837 apples
Also if need remove duplicates per group and per column add unique
:
df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With