Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort Pandas Dataframe by substrings of a column

Given a DataFrame:

    name             email
0   Carl    [email protected]
1    Bob     [email protected]
2  Alice   [email protected]
3  David  [email protected]
4    Eve     [email protected]

How can it be sorted according to the email's domain name (alphabetically, ascending), and then, within each domain group, according to the string before the "@"?

The result of sorting the above should then be:

    name             email
0    Bob     [email protected]
1    Eve     [email protected]
2  David  [email protected]
3  Alice   [email protected]
4   Carl    [email protected]
like image 999
IamTheWalrus Avatar asked Apr 09 '18 07:04

IamTheWalrus


1 Answers

Use:

df = df.reset_index(drop=True)
idx = df['email'].str.split('@', expand=True).sort_values([1,0]).index
df = df.reindex(idx).reset_index(drop=True)
print (df)
    name             email
0    Bob     [email protected]
1    Eve     [email protected]
2  David  [email protected]
3  Alice   [email protected]
4   Carl    [email protected]

Explanation:

  1. First reset_index with drop=True for unique default indices
  2. Then split values to new DataFrame and sort_values
  3. Last reindex to new order
like image 171
jezrael Avatar answered Sep 30 '22 16:09

jezrael