I am working with chicago crime data and want to aggregated count of top 5 crimes for each region/community area. However, my code works but I got unwanted index and CategoricalIndex type column in dataframe columns which stop me to access particular columns for further data manipulation.
what I did:
crimes_2012 = pd.read_csv('Chicago_Crimes_2012_to_2017.csv', sep=',', error_bad_lines=False)
df=crimes_2012[['Primary Type', 'Location Description', 'Community Area']]
crime_catg = df.groupby(['Community Name', 'Primary Type'])['Primary Type'].count().unstack()
crime_catg = crime_catg[['THEFT','BATTERY', 'CRIMINAL DAMAGE', 'NARCOTICS', 'ASSAULT']]
crime_catg = crime_catg.dropna()
here is my current output that needs to be improved:
here is my attempt:
when I tried below code, I still didn't get new index and index name displayed strange in output dataframe. why? how to fix this? any idea? Thanks
even when I tried to reindex dataframe it didn't get new index after all.
crime_catg.reindex(inplace=True, drop=True)
any idea to fix this issue? any thought?
There are a couple of ways to handle this.
1) Keep the CategoricalIndex
type and the use .add_categories
method to update valid categories eg to fix your .reindex
problem:
crime_catg.columns = crime_catg.columns.add_categories(['Community Name'])
2) Cast as pandas.Index
:
crime_catg.columns = pd.Index(list(crime_catg.columns))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With