Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to sort dataframe based on particular (string)columns using python pandas?

My Pandas data frame contains the following data:

product,values
 a1,     10
 a5,     20
 a10,    15
 a2,     45
 a3,     12
 a6,     67

I have to sort this data frame based on the product column. Thus, I would like to get the following output:

product,values
 a10,     15
 a6,      67
 a5,      20
 a3,      12
 a2,      45
 a1,      10

Unfortunately, I'm facing the following error:

ErrorDuringImport(path, sys.exc_info())

ErrorDuringImport: problem in views - type 'exceptions.Indentation

like image 442
Sai Rajesh Avatar asked Jun 08 '16 04:06

Sai Rajesh


People also ask

How do you sort DataFrame based on column names?

To sort a DataFrame based on column names we can call sort_index() on the DataFrame object with argument axis=1 i.e. As we can see, instead of modifying the original dataframe it returned a sorted copy of dataframe based on column names.

How do I sort a grouped column in pandas?

Practical Data Science using Python To group Pandas dataframe, we use groupby(). To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.

How do you sort a DataFrame by index in Python?

To sort a Pandas DataFrame by index, you can use DataFrame. sort_index() method. To specify whether the method has to sort the DataFrame in ascending or descending order of index, you can set the named boolean argument ascending to True or False respectively. When the index is sorted, respective rows are rearranged.


1 Answers

You can first extract digits and cast to int by astype. Then sort_values of column sort and last drop this column:

df['sort'] = df['product'].str.extract('(\d+)', expand=False).astype(int)
df.sort_values('sort',inplace=True, ascending=False)
df = df.drop('sort', axis=1)
print (df)
  product  values
2     a10      15
5      a6      67
1      a5      20
4      a3      12
3      a2      45
0      a1      10

It is necessary, because if use only sort_values:

df.sort_values('product',inplace=True, ascending=False)
print (df)
  product  values
5      a6      67
1      a5      20
4      a3      12
3      a2      45
2     a10      15
0      a1      10

Another idea is use natsort library:

from natsort import index_natsorted, order_by_index

df = df.reindex(index=order_by_index(df.index, index_natsorted(df['product'], reverse=True)))
print (df)
  product  values
2     a10      15
5      a6      67
1      a5      20
4      a3      12
3      a2      45
0      a1      10
like image 112
jezrael Avatar answered Sep 28 '22 06:09

jezrael