I would like to add column names to the results of a groupby on a DataFrame
in Python 3.6.
I tried this code:
import pandas as pd
d = {'timeIndex': [1, 1, 1, 1, 2, 2, 2], 'isZero': [0,0,0,1,0,0,0]}
df = pd.DataFrame(data=d)
df2 = df.groupby(['timeIndex'])['isZero'].sum()
print(df2)
Result
timeIndex
1 1
2 0
Name: isZero, dtype: int64
It looks like timeIndex
is a column heading, but attempts to address a column by name produce exceptions.
df2['timeIndex']
# KeyError: 'timeIndex'
df2['isZero']
# KeyError: 'isZero'
I am looking for this result.
df2
timeIndex isZero
0 1 1
1 2 0
df2['isZero']
0 1
1 0
Index objects are not required to be unique; you can have duplicate row or column labels.
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set.
Method 1:
use the argument as_index = False
in your groupby
:
df2 = df.groupby(['timeIndex'], as_index=False)['isZero'].sum()
>>> df2
timeIndex isZero
0 1 1
1 2 0
>>> df2['isZero']
0 1
1 0
Name: isZero, dtype: int64
Method 2:
You can use to_frame
with your desired column name and then reset_index
:
df2 = df.groupby(['timeIndex'])['isZero'].sum().to_frame('isZero').reset_index()
>>> df2
timeIndex isZero
0 1 1
1 2 0
>>> df2['isZero']
0 1
1 0
Name: isZero, dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With