If I have the following Dataframe
>>> df = pd.DataFrame({'Name': ['Bob'] * 3 + ['Alice'] * 3, \
'Destination': ['Athens', 'Rome'] * 3, 'Length': np.random.randint(1, 6, 6)})
>>> df
Destination Length Name
0 Athens 3 Bob
1 Rome 5 Bob
2 Athens 2 Bob
3 Rome 1 Alice
4 Athens 3 Alice
5 Rome 5 Alice
I can goup by name and destination...
>>> grouped = df.groupby(['Name', 'Destination'])
>>> for nm, gp in grouped:
>>> print nm
>>> print gp
('Alice', 'Athens')
Destination Length Name
4 Athens 3 Alice
('Alice', 'Rome')
Destination Length Name
3 Rome 1 Alice
5 Rome 5 Alice
('Bob', 'Athens')
Destination Length Name
0 Athens 3 Bob
2 Athens 2 Bob
('Bob', 'Rome')
Destination Length Name
1 Rome 5 Bob
but I would like a new multi-indexed dataframe out of it that looks something like
Length
Alice Athens 3
Rome 1
Rome 5
Bob Athens 3
Athens 2
Rome 5
It seems there should be a way to do something like Dataframe(grouped)
to get my multi-indexed Dataframe, but instead I get a PandasError
("DataFrame constructor not properly called!").
What is the easiest way to get this? Also, anyone know if there will ever be an option to pass a groupby object to the constructor, or if I'm just doing it wrong?
Thanks
Overview: Create a dataframe using an dictionary. Group by item_id and find the max value. enumerate over the grouped dataframe and use the key which is an numeric value to return the alpha index value. Create an result_df dataframe if you desire.
Creating a MultiIndex (hierarchical index) object A MultiIndex can be created from a list of arrays (using MultiIndex. from_arrays() ), an array of tuples (using MultiIndex. from_tuples() ), a crossed set of iterables (using MultiIndex. from_product() ), or a DataFrame (using MultiIndex.
How to perform groupby index in pandas? Pass index name of the DataFrame as a parameter to groupby() function to group rows on an index. DataFrame. groupby() function takes string or list as a param to specify the group columns or index.
Since you're not aggregating similarly indexed rows, try setting the index with a list of column names.
In [2]: df.set_index(['Name', 'Destination'])
Out[2]:
Length
Name Destination
Bob Athens 3
Rome 5
Athens 2
Alice Rome 1
Athens 3
Rome 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With