Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining rows to 'others' in pandas

Tags:

python

pandas

I have a pandas dataframe like this:

  character  count
0         a    104
1         b     30
2         c    210
3         d     40
4         e    189
5         f     20
6         g     10

I want to have only the top 3 characters in the dataframe and the remaining are combined as others so table become:

  character  count
0         c    210
1         e    189
2         a    104
3    others    100

How can I achieve this?

Thank you.

like image 842
nt.jin Avatar asked Apr 21 '17 09:04

nt.jin


People also ask

How do you combine rows in Python?

The concat() function in pandas is used to append either columns or rows from one DataFrame to another.

How do I combine two values in Pandas?

Concatenate Two Columns Using + Operator in pandas By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

What is the difference between merge join and concatenate in Pandas?

merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.

How do you group similar rows in Pandas?

You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.


1 Answers

we can use Series.nlargest() method:

In [31]: new = df.nlargest(3, columns='count')

In [32]: new = pd.concat(
    ...:         [new,
    ...:          pd.DataFrame({'character':['others'],
    ...:                        'count':df.drop(new.index)['count'].sum()})
    ...:         ], ignore_index=True)
    ...:

In [33]: new
Out[33]:
  character  count
0         c    210
1         e    189
2         a    104
3    others     60

or bit less idiomatic solution:

In [16]: new = df.nlargest(3, columns='count')

In [17]: new.loc[len(new)] = ['others', df.drop(new.index)['count'].sum()]

In [18]: new
Out[18]:
  character  count
2         c    210
4         e    189
0         a    104
3    others    100
like image 176
MaxU - stop WAR against UA Avatar answered Sep 27 '22 19:09

MaxU - stop WAR against UA