I'm trying to left join multiple pandas dataframes on a single Id
column, but when I attempt the merge I get warning:
KeyError: 'Id'.
I think it might be because my dataframes have offset columns resulting from a groupby
statement, but I could very well be wrong. Either way I can't figure out how to "unstack" my dataframe column headers. None of the answers at this question seem to work.
My groupby
code:
step1 = pd.DataFrame(step3.groupby(['Id', 'interestingtabsplittest2__grp'])['applications'].sum()) step1.sort('applications', ascending=False).head(3)
Returns:
How to get those offset headers into the top level?
Using reset_index() function Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function in Python.
The first index will have the column name and the second index will have the name of the aggregated function. Now, use stack() at level 0 of the grouped dataframe and unstack() the grouped dataframe. Then, use stack() at level 1 of the grouped dataframe and unstack() the grouped dataframe.
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.
You're looking for .reset_index()
.
In [11]: df = pd.DataFrame([[2, 3], [5, 6]], pd.Index([1, 4], name="A"), columns=["B", "C"]) In [12]: df Out[12]: B C A 1 2 3 4 5 6 In [13]: df.reset_index() Out[13]: A B C 0 1 2 3 1 4 5 6
Note: That you can avoid this step by using as_index=False
when doing the groupby.
step1 = step3.groupby(['Id', 'interestingtabsplittest2__grp'], as_index=False)['applications'].sum()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With