Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas partial join on multiindex

Tags:

python

pandas

So, this is my problem:

dfa = pd.DataFrame({"a": [["a", "b", "c"][int(k/10)] for k in range(30)],
                    "b": ["a" + repr([10, 20, 30, 40, 50, 60][int(k/5)]) for k in range(30)],
                    "c": np.arange(30),
                    "d": np.random.normal(size=30)}).set_index(["a","b","c"])
dfb = pd.DataFrame({"a": [["a", "b", "c"][int(k/2)] for k in range(6)],
                    "b": ["a" + repr([10, 20, 30, 40, 50, 60][k]) for k in range(6)],
                    "m": np.random.normal(size=6)**2}).set_index(["a","b"])

Essentially I have two dataframes with multi-indices and I want to divide dfa.d by dfb.m, joining on ("a", "b"). I can't naively do dfa.d / dfb.m or join because it says that merging with more than one level overlap on a multi-index is not implemented.

The most straightforward (lol) way of doing this that I found is:

dfc = dfa.reset_index().set_index(["a", "b"]).join(dfb)
dfc["r"] = dfc.d / dfc.m
dfd = dfc.reset_index().set_index(["a", "b", "c"])[["r"]]

Any shortcuts?

like image 404
marco Avatar asked Aug 25 '14 20:08

marco


People also ask

Can Python join multiple series with a multiindex in pandas?

python - Join Series on MultiIndex in pandas - Stack Overflow I have multiple Series with a MultiIndex and I'd like to combine them into a single DataFrame which joins them on the common index names (and broadcasts values). The setup is like import pandas as... Stack Overflow About Products For Teams

How to access data in a multiindex Dataframe in pandas?

Accessing Data in a MultiIndex DataFrame in Pandas 1. Selecting data via the first level index When it comes to select data on a DataFrame, Pandas loc is one of the top... 2. Selecting data via multi-level index If you want to read London ’s Day weather on 2019–07–01, you can simply do: >>>... 3. ...

How do I perform a groupby on a multi-index Dataframe?

When calling a groupby () function on a multi-index dataframe, you can also specific the index to perform the groupby. Let’s generate a new dataframe df2 based on the df_melted dataframe that you saw earlier: Say you want the mean temperature and humidity for each country.

How to convert a multiindex to an index of tuples in pandas?

As of Pandas version 0.24.0, the to_flat_index () converts a MultiIndex to an Index of Tuples containing the level values: By assigning the result to df_grouped.columns, the result will look like this:


1 Answers

There's an open bug for this problem and the current milestone says 0.15.1.

Until something nicer comes along, there's a workaround involving the following steps:

  • get the non-matching index level(s) out the way by unstacking them into columns
  • perform the multiplication/division
  • stack the columns back to where they were.

Like this:

In [109]: dfa.unstack('c').mul(dfb.squeeze(), axis=0).stack('c')
Out[109]: 
                  d
a b   c            
a a10 0    1.535221
      1   -2.151894
      2    1.986061
      3   -1.946031
      4   -4.868800
  a20 5   -2.278917
      6   -1.535684
      7    2.289102
      8   -0.442284
      9   -0.547209
b a30 10 -12.568426
      11   7.180348
      12   1.584510
      13   3.419332
      14  -3.011810
  a40 15  -0.367091
      16   4.264955
      17   2.410733
      18   0.030926
      19   1.219653
c a50 20   0.110586
      21  -0.430263
      22   0.350308
      23   1.101523
      24  -1.371180
  a60 25  -0.003683
      26   0.069884
      27   0.206635
      28   0.356708
      29   0.111380

Notice two things:

  1. dfb has to be a Series, otherwise there's additional complication about which columns of dfb to use for the multiplication. You could replace dfb.squeeze() with dfb['m'].
  2. If the non-matching index was not already the last of the three, the order of the index levels would not be preserved. In this case, do what @jreback suggests and reorder the index levels afterwards: .reorder_levels(dfa.index.names)
like image 75
LondonRob Avatar answered Oct 20 '22 00:10

LondonRob