I have the following pandas dataframe:
token    year    uses  books   386   xanthos  1830    3     3   387   xanthos  1840    1     1   388   xanthos  1840    2     2   389   xanthos  1868    2     2   390   xanthos  1875    1     1  I aggregate the rows with duplicate token and years like so:
dfalph = dfalph[['token','year','uses','books']].groupby(['token', 'year']).agg([np.sum]) dfalph.columns = dfalph.columns.droplevel(1)                 uses  books token    year        xanthos  1830    3     3          1840    3     3          1867    2     2          1868    2     2          1875    1     1  Instead of having the 'token' and 'year' fields in the index, I would like to return them to columns and have an integer index.
To reset index after group by, at first group according to a column using groupby(). After that, use reset_index().
Overview: Create a dataframe using an dictionary. Group by item_id and find the max value. enumerate over the grouped dataframe and use the key which is an numeric value to return the alpha index value. Create an result_df dataframe if you desire.
pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.
The Groupby Rolling function does not preserve the original index and so when dates are the same within the Group, it is impossible to know which index value it pertains to from the original dataframe.
Method #1: reset_index()
>>> g               uses  books                sum    sum token   year              xanthos 1830     3      3         1840     3      3         1868     2      2         1875     1      1  [4 rows x 2 columns] >>> g = g.reset_index() >>> g      token  year  uses  books                    sum    sum 0  xanthos  1830     3      3 1  xanthos  1840     3      3 2  xanthos  1868     2      2 3  xanthos  1875     1      1  [4 rows x 4 columns]   Method #2: don't make the index in the first place, using as_index=False
>>> g = dfalph[['token', 'year', 'uses', 'books']].groupby(['token', 'year'], as_index=False).sum() >>> g      token  year  uses  books 0  xanthos  1830     3      3 1  xanthos  1840     3      3 2  xanthos  1868     2      2 3  xanthos  1875     1      1  [4 rows x 4 columns] 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With