Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Joining a Pandas series with a hierarchical index back to the source DataFrame

I'm trying to wrap my brain around pandas data structures and trying to use them in anger a bit. I've figured out that groupby operations result in a pandas series object. But I can't quite figure out how to use the resulting series. In particular, I want to do two thing:

1) "join" the results back to the initial DataFrame

2) select a specific value from the resulting series based on the hierarchical index.

Here's a toy example to work with:

import pandas
df = pandas.DataFrame({'group1': ['a','a','a','b','b','b'],
                       'group2': ['c','c','d','d','d','e'],
                       'value1': [1.1,2,3,4,5,6],
                       'value2': [7.1,8,9,10,11,12]
})
dfGrouped = df.groupby( ["group1", "group2"] , sort=True)

## toy function, obviously not my real function
def fun(x): return mean(x**2)

results = dfGrouped.apply(lambda x: fun(x.value1))

so the resulting series (results) looks like this:

group1  group2
a       c          2.605
        d          9.000
b       d         20.500
        e         36.000

That makes sense. But how do I:

1) join this back to the original DataFrame df

2) Select a single value where, say, group1=='b' & group2=='d'

like image 492
JD Long Avatar asked Aug 09 '12 14:08

JD Long


People also ask

Can you merge a series to a DataFrame pandas?

By using pandas. concat() you can combine pandas objects for example multiple series along a particular axis (column-wise or row-wise) to create a DataFrame. concat() method takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.

How do I keep index from merging pandas?

You can also use DataFrame. join() method to achieve the same thing. The join method will persist the original index. The column to join can be specified with on parameter.

How do I merge two DataFrames based on index?

concat() to Merge Two DataFrames by Index. You can concatenate two DataFrames by using pandas. concat() method by setting axis=1 , and by default, pd. concat is a row-wise outer join.


2 Answers

So for remaining #1.

In [9]: df
Out[9]:
  group1 group2  value1  value2
0      a      c     1.1     7.1
1      a      c     2.0     8.0
2      a      d     3.0     9.0
3      b      d     4.0    10.0
4      b      d     5.0    11.0
5      b      e     6.0    12.0

In [10]: results
Out[10]:
group1  group2
a       c          2.605
        d          9.000
b       d         20.500
        e         36.000

In [11]: df.set_index(['group1', 'group2'], inplace=True)['results'] = results

In [12]: df
Out[12]:
               value1  value2  results
group1 group2
a      c          1.1     7.1    2.605
       c          2.0     8.0    2.605
       d          3.0     9.0    9.000
b      d          4.0    10.0   20.500
       d          5.0    11.0   20.500
       e          6.0    12.0   36.000

In [13]: df.reset_index()
Out[13]:
  group1 group2  value1  value2  results
0      a      c     1.1     7.1    2.605
1      a      c     2.0     8.0    2.605
2      a      d     3.0     9.0    9.000
3      b      d     4.0    10.0   20.500
4      b      d     5.0    11.0   20.500
5      b      e     6.0    12.0   36.000
like image 108
Wouter Overmeire Avatar answered Nov 13 '22 14:11

Wouter Overmeire


While monkeying around I discovered the answer to #2:

results["b","d"] gives me the value where group1=='b' & group2=='d'

like image 44
JD Long Avatar answered Nov 13 '22 16:11

JD Long