Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to reorder rows in pandas dataframe by factor level in python?

I've created a small dataset comparing coffee drink prices per cup size.

When I pivot my dataset the output automatically reorders the index (the 'Size' column) alphabetically.

Is there a way to assign the different sizes a numerical level (e.g. small = 0, medium = 1, large = 2) and reorder the rows this way instead?

I'm know this can be done in R using the forcats library (using fct_relevel for example), but I'm not aware of how to do this in python. I would prefer to keep the solution to using numpy and pandas.

data = {'Item': np.repeat(['Latte', 'Americano', 'Cappuccino'], 3),
        'Size': ['Small', 'Medium', 'Large']*3,
        'Price': [2.25, 2.60, 2.85, 1.95, 2.25, 2.45, 2.65, 2.95, 3.25]
       }

df = pd.DataFrame(data, columns = ['Item', 'Size', 'Price'])
df = pd.pivot_table(df, index = ['Size'], columns = 'Item')
df

#         Price
# Item    Americano Cappuccino  Latte
#   Size            
#  Large       2.45       3.25   2.85
# Medium       2.25       2.95   2.60
#  Small       1.95       2.65   2.25
like image 371
bluesky Avatar asked Mar 02 '23 09:03

bluesky


1 Answers

You can use a Categorical type with ordered=True:

df.index = pd.Categorical(df.index,
                          categories=['Small', 'Medium', 'Large'],
                          ordered=True)
df = df.sort_index()

output:

           Price                 
Item   Americano Cappuccino Latte
Small       1.95       2.65  2.25
Medium      2.25       2.95  2.60
Large       2.45       3.25  2.85

You can access the codes with:

>>> df.index.codes
array([0, 1, 2], dtype=int8)

If this was a Series:

>>> series.cat.codes
like image 161
mozway Avatar answered Apr 06 '23 10:04

mozway