Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort Index by list - Python Pandas

Tags:

python

pandas

I have a dataframe that I have pivoted:

FinancialYear   2014/2015   2015/2016   2016/2017   2017/2018
Month               
April             42           32          29          27
August            34           28          32           0
December          45           51          28           0
February          28           20          28           0
January           32           28          33           0
July              40           66          31          30
June              32           67          37          35
March             43           36          39           0
May               34           30          24          29
November          39           32          31           0
October           38           39          28           0
September         29           19          34           0

This is the code that I used:

new_hm01 = hmdf[['FinancialYear','Month','FirstReceivedDate']]

hm05 = new_hm01.pivot_table(index=['FinancialYear','Month'], aggfunc='count')

df_hm = new_hm01.groupby(['Month', 'FinancialYear']).size().unstack(fill_value=0).rename(columns=lambda x: '{}'.format(x))

The Months are not in the order I want, so I used the following code to reindex it according to a list:

vals = ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March']

df_hm = df_hm.reindex(vals)

This worked, but the values in my table are now mostly showing NaN values.

FinancialYear   2014/2015   2015/2016   2016/2017   2017/2018
Month               
April              nan          nan         nan         nan
May                nan          nan         nan         nan
June               nan          nan         nan         nan
July               nan          nan         nan         nan
August             nan          nan         nan         nan
September           29           19          34           0
October            nan          nan         nan         nan
November           nan          nan         nan         nan
December           nan          nan         nan         nan
January            nan          nan         nan         nan
February           nan          nan         nan         nan
March              nan          nan         nan         nan

Any idea on what is happening? How to fix it? and if there is a better alternative method?

like image 343
ScoutEU Avatar asked Jul 29 '17 12:07

ScoutEU


People also ask

How do you sort a list by index in Python?

Method 1: Sort list of lists using sort() + lambda. The anonymous nature of Python Lambda Functions indicates that they lack a name. The Python sort() can be used to perform this variation of sort by passing a function. The list can be sorted using the sort function both ascendingly and descendingly.

How do I sort by index in pandas?

To sort a Pandas DataFrame by index, you can use DataFrame. sort_index() method. To specify whether the method has to sort the DataFrame in ascending or descending order of index, you can set the named boolean argument ascending to True or False respectively. When the index is sorted, respective rows are rearranged.


1 Answers

Unexpected NaNs after reindexing are often due to the new index labels not exactly matching the old index labels. For example, if the original index labels contains whitespaces, but the new labels don't, then you'll get NaNs:

import numpy as np
import pandas as pd

df = pd.DataFrame({'col':[1,2,3]}, index=['April ', 'June ', 'May ', ])
print(df)
#         col
# April     1
# June      2
# May       3

df2 = df.reindex(['April', 'May', 'June'])
print(df2)
#        col
# April  NaN
# May    NaN
# June   NaN

This can be fixed by removing the whitespace to make the labels match:

df.index = df.index.str.strip()
df3 = df.reindex(['April', 'May', 'June'])
print(df3)
#        col
# April    1
# May      3
# June     2
like image 170
unutbu Avatar answered Sep 29 '22 04:09

unutbu