I'm trying to wrap my brain around pandas
data structures and trying to use them in anger a bit. I've figured out that groupby
operations result in a pandas series object. But I can't quite figure out how to use the resulting series. In particular, I want to do two thing:
1) "join" the results back to the initial DataFrame
2) select a specific value from the resulting series based on the hierarchical index.
Here's a toy example to work with:
import pandas
df = pandas.DataFrame({'group1': ['a','a','a','b','b','b'],
'group2': ['c','c','d','d','d','e'],
'value1': [1.1,2,3,4,5,6],
'value2': [7.1,8,9,10,11,12]
})
dfGrouped = df.groupby( ["group1", "group2"] , sort=True)
## toy function, obviously not my real function
def fun(x): return mean(x**2)
results = dfGrouped.apply(lambda x: fun(x.value1))
so the resulting series (results) looks like this:
group1 group2
a c 2.605
d 9.000
b d 20.500
e 36.000
That makes sense. But how do I:
1) join this back to the original DataFrame df
2) Select a single value where, say, group1=='b' & group2=='d'
By using pandas. concat() you can combine pandas objects for example multiple series along a particular axis (column-wise or row-wise) to create a DataFrame. concat() method takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.
You can also use DataFrame. join() method to achieve the same thing. The join method will persist the original index. The column to join can be specified with on parameter.
concat() to Merge Two DataFrames by Index. You can concatenate two DataFrames by using pandas. concat() method by setting axis=1 , and by default, pd. concat is a row-wise outer join.
So for remaining #1.
In [9]: df
Out[9]:
group1 group2 value1 value2
0 a c 1.1 7.1
1 a c 2.0 8.0
2 a d 3.0 9.0
3 b d 4.0 10.0
4 b d 5.0 11.0
5 b e 6.0 12.0
In [10]: results
Out[10]:
group1 group2
a c 2.605
d 9.000
b d 20.500
e 36.000
In [11]: df.set_index(['group1', 'group2'], inplace=True)['results'] = results
In [12]: df
Out[12]:
value1 value2 results
group1 group2
a c 1.1 7.1 2.605
c 2.0 8.0 2.605
d 3.0 9.0 9.000
b d 4.0 10.0 20.500
d 5.0 11.0 20.500
e 6.0 12.0 36.000
In [13]: df.reset_index()
Out[13]:
group1 group2 value1 value2 results
0 a c 1.1 7.1 2.605
1 a c 2.0 8.0 2.605
2 a d 3.0 9.0 9.000
3 b d 4.0 10.0 20.500
4 b d 5.0 11.0 20.500
5 b e 6.0 12.0 36.000
While monkeying around I discovered the answer to #2:
results["b","d"]
gives me the value where group1=='b' & group2=='d'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With