So, this is my problem:
dfa = pd.DataFrame({"a": [["a", "b", "c"][int(k/10)] for k in range(30)],
"b": ["a" + repr([10, 20, 30, 40, 50, 60][int(k/5)]) for k in range(30)],
"c": np.arange(30),
"d": np.random.normal(size=30)}).set_index(["a","b","c"])
dfb = pd.DataFrame({"a": [["a", "b", "c"][int(k/2)] for k in range(6)],
"b": ["a" + repr([10, 20, 30, 40, 50, 60][k]) for k in range(6)],
"m": np.random.normal(size=6)**2}).set_index(["a","b"])
Essentially I have two dataframes with multi-indices and I want to divide dfa.d
by dfb.m
, joining on ("a", "b")
. I can't naively do dfa.d / dfb.m
or join
because it says that merging with more than one level overlap on a multi-index is not implemented
.
The most straightforward (lol) way of doing this that I found is:
dfc = dfa.reset_index().set_index(["a", "b"]).join(dfb)
dfc["r"] = dfc.d / dfc.m
dfd = dfc.reset_index().set_index(["a", "b", "c"])[["r"]]
Any shortcuts?
python - Join Series on MultiIndex in pandas - Stack Overflow I have multiple Series with a MultiIndex and I'd like to combine them into a single DataFrame which joins them on the common index names (and broadcasts values). The setup is like import pandas as... Stack Overflow About Products For Teams
Accessing Data in a MultiIndex DataFrame in Pandas 1. Selecting data via the first level index When it comes to select data on a DataFrame, Pandas loc is one of the top... 2. Selecting data via multi-level index If you want to read London ’s Day weather on 2019–07–01, you can simply do: >>>... 3. ...
When calling a groupby () function on a multi-index dataframe, you can also specific the index to perform the groupby. Let’s generate a new dataframe df2 based on the df_melted dataframe that you saw earlier: Say you want the mean temperature and humidity for each country.
As of Pandas version 0.24.0, the to_flat_index () converts a MultiIndex to an Index of Tuples containing the level values: By assigning the result to df_grouped.columns, the result will look like this:
There's an open bug for this problem and the current milestone says 0.15.1
.
Until something nicer comes along, there's a workaround involving the following steps:
unstack
ing them into columnsstack
the columns back to where they were.Like this:
In [109]: dfa.unstack('c').mul(dfb.squeeze(), axis=0).stack('c')
Out[109]:
d
a b c
a a10 0 1.535221
1 -2.151894
2 1.986061
3 -1.946031
4 -4.868800
a20 5 -2.278917
6 -1.535684
7 2.289102
8 -0.442284
9 -0.547209
b a30 10 -12.568426
11 7.180348
12 1.584510
13 3.419332
14 -3.011810
a40 15 -0.367091
16 4.264955
17 2.410733
18 0.030926
19 1.219653
c a50 20 0.110586
21 -0.430263
22 0.350308
23 1.101523
24 -1.371180
a60 25 -0.003683
26 0.069884
27 0.206635
28 0.356708
29 0.111380
Notice two things:
dfb
has to be a Series
, otherwise there's additional complication about which columns of dfb
to use for the multiplication. You could replace dfb.squeeze()
with dfb['m']
..reorder_levels(dfa.index.names)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With