Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Groupby a DataFrame into a new DataFrame with arange as index [duplicate]

I have a question, simplified in this example. Consider this Pandas DataFrame, df_a:

df_a=pd.DataFrame([['1001',34.3,'red'],['1001',900.04,'red'],['1001',776,'red'],['1003',18.95,'green'],['1004',321.2,'blue']],columns=['id','amount','name'])

    id      amount  name
0   1001    34.30   red
1   1001    900.04  red
2   1001    776.00  red
3   1003    18.95   green
4   1004    321.20  blue

I would like to groupby this dataframe by summing the amount into a new DataFrame and create a new 'arange'-like index. This should be the result I would like to have:

    id      amount
0   1001    1710.34
1   1003    18.95
2   1004    321.20

But my efforts create a Series (I would like a DataFrame as result):

df_a.groupby(['id'])['amount'].sum()

id
1001    1710.34
1003      18.95
1004     321.20
Name: amount, dtype: float64

or create a new index based on the id column:

pd.DataFrame(df_a.groupby(['id'])['amount'].sum())

        amount
id  
1001    1710.34
1003    18.95
1004    321.20

I've also tried to pass the index parameter, but that doesn't work either:

pd.DataFrame(df_a.groupby(['id'])['amount'].sum(),index=df_a.index.values)

   amount
0   NaN
1   NaN
2   NaN
3   NaN
4   NaN

Does anyone have an elegant solution for this ?

like image 747
Bas Avatar asked Dec 20 '17 01:12

Bas


2 Answers

You have a parameter as_index in groupby for that

df_a.groupby('id', as_index = False)['amount'].sum()

You get

    id  amount
0   1001    1710.34
1   1003    18.95
2   1004    321.20
like image 109
Vaishali Avatar answered Sep 17 '22 15:09

Vaishali


You can try the following by adding to_frame() and reset_index():

new_df = df_a.groupby(['id'])['amount'].sum().to_frame('amount').reset_index()
print(new_df)

Result:

     id   amount
0  1001  1710.34
1  1003    18.95
2  1004   321.20

If you only use to_frame() i.e. using

df_a.groupby(['id'])['amount'].sum().to_frame('amount')

it will keep index on id as following:

      amount
id           
1001  1710.34
1003    18.95
1004   321.20

Other way is to reset index on dataframe in your above code:

new_df = pd.DataFrame(df_a.groupby(['id'])['amount'].sum()).reset_index()

Output would be same as above:

     id   amount
0  1001  1710.34
1  1003    18.95
2  1004   321.20
like image 45
student Avatar answered Sep 17 '22 15:09

student