Joining a Pandas series with a hierarchical index back to the source DataFrame

Tags:

I'm trying to wrap my brain around pandas data structures and trying to use them in anger a bit. I've figured out that groupby operations result in a pandas series object. But I can't quite figure out how to use the resulting series. In particular, I want to do two thing:

1) "join" the results back to the initial DataFrame

2) select a specific value from the resulting series based on the hierarchical index.

Here's a toy example to work with:

import pandas
df = pandas.DataFrame({'group1': ['a','a','a','b','b','b'],
                       'group2': ['c','c','d','d','d','e'],
                       'value1': [1.1,2,3,4,5,6],
                       'value2': [7.1,8,9,10,11,12]
})
dfGrouped = df.groupby( ["group1", "group2"] , sort=True)

## toy function, obviously not my real function
def fun(x): return mean(x**2)

results = dfGrouped.apply(lambda x: fun(x.value1))

so the resulting series (results) looks like this:

group1  group2
a       c          2.605
        d          9.000
b       d         20.500
        e         36.000

That makes sense. But how do I:

1) join this back to the original DataFrame df

2) Select a single value where, say, group1=='b' & group2=='d'

492

asked Aug 09 '12 14:08

JD Long

2 Answers

So for remaining #1.

In [9]: df
Out[9]:
  group1 group2  value1  value2
0      a      c     1.1     7.1
1      a      c     2.0     8.0
2      a      d     3.0     9.0
3      b      d     4.0    10.0
4      b      d     5.0    11.0
5      b      e     6.0    12.0

In [10]: results
Out[10]:
group1  group2
a       c          2.605
        d          9.000
b       d         20.500
        e         36.000

In [11]: df.set_index(['group1', 'group2'], inplace=True)['results'] = results

In [12]: df
Out[12]:
               value1  value2  results
group1 group2
a      c          1.1     7.1    2.605
       c          2.0     8.0    2.605
       d          3.0     9.0    9.000
b      d          4.0    10.0   20.500
       d          5.0    11.0   20.500
       e          6.0    12.0   36.000

In [13]: df.reset_index()
Out[13]:
  group1 group2  value1  value2  results
0      a      c     1.1     7.1    2.605
1      a      c     2.0     8.0    2.605
2      a      d     3.0     9.0    9.000
3      b      d     4.0    10.0   20.500
4      b      d     5.0    11.0   20.500
5      b      e     6.0    12.0   36.000

108

answered Nov 13 '22 14:11

Wouter Overmeire

While monkeying around I discovered the answer to #2:

results["b","d"] gives me the value where group1=='b' & group2=='d'

answered Nov 13 '22 16:11

JD Long

Related questions
                            
                                Not so simple SQL queries
                            
                                Copying artifacts from multiple upstream jobs at join in Jenkins
                            
                                Joining two spark dataframes on time (TimestampType) in python
                            
                                perform join on multiple DataFrame in spark
                            
                                Golang db.Query with sql join
                            
                                How are inner and left and right outer joins implemented in SQL Server?
                            
                                How order of joins affect performance of a query
                            
                                Join and sum not compatible matrices
                            
                                Joining tables if the reference exists
                            
                                Using JOIN statement with CONTAINS function
                            
                                Join two maps by key
                            
                                laravel join table if data exists
                            
                                How to join array in pipe
                            
                                Group By with CodeIgniter [closed]
                            
                                Pandas: Join dataframe with condition
                            
                                More than one path to JOIN the same table in Postgres
                            
                                MySQL not using index with JOIN, WHERE and ORDER
                            
                                SQL: Performance comparison for exclusion (Join vs Not in)
                            
                                when choose CROSS APPLY and when EXISTS?
                            
                                C# DataTable Inner join with dynamic columns

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Joining a Pandas series with a hierarchical index back to the source DataFrame

Tags:

join

pandas

dataframe

group-by

ipython

JD Long

People also ask

2 Answers

Wouter Overmeire

JD Long

Recent Activity

Donate For Us