Toy example
Suppose that base_df is the tiny dataframe shown below:
In [221]: base_df
Out[221]:
seed
I S
0 a 0
b 1
1 a 2
b 3
Note that base_df has a 2-level multi-index for the rows. (Part of the problem here involves "propagating" this multi-index's values in a derived dataframe.)
Now, the function fn (definition given at the end of this post) takes an integer seed as argument and returns a 1-column dataframe indexed by string keys1. For example:
In [222]: fn(0)
Out[222]:
F
key
01011 0.592845
10100 0.844266
In [223]: fn(1)
Out[223]:
F
key
11110 0.997185
01000 0.932557
11100 0.128124
I want to generate a new dataframe, in essence, by applying fn to every row of base_df, and concatenating the resulting dataframes vertically. More specifically, the desired result would look like this:
F
I S key
0 a 01011 0.592845
10100 0.844266
b 11110 0.997185
01000 0.932557
11100 0.128124
1 a 01101 0.185082
01110 0.931541
b 00100 0.070725
11011 0.839949
11111 0.121329
11000 0.569311
IOW, conceptually, the desired dataframe is obtained by generating one "sub-dataframe" for each row of base_df, and concatenating these sub-dataframes vertically. The sub-dataframe corresponding to each row has a 3-level multi-index. The first two levels (I and S) of this multi-index come from base_df's multi-index value for that row, while its last level (key), as well as the values for the (lone) F column come from the dataframe returned by fn for that row's seed value.
The part I'm not clear on is how to propagate the row's original multi-index value to the rows of the dataframe created by fn for that row's seed value.
IMPORTANT: I'm looking for a way to do this that is agnostic to the names of the base_df's multi-index's levels, and their number.
I tried the following
base_df.apply(lambda row: fn(row.seed), axis=1)
...but the evaluation fails with the error
ValueError: Shape of passed values is (4, 2), indices imply (4, 1)
Is there some convenient way to do what I'm trying to do?
Here's the definition of fn. Its internals are unimportant as far as this question is concerned. What matters is that it takes an integer seed as argument, and returns a dataframe, as described earlier.
import numpy
def fn(seed, _spec='{{0:0{0:d}b}}'.format(5)):
numpy.random.seed(int(seed))
n = numpy.random.randint(2, 5)
r = numpy.random.rand(n)
k = map(_spec.format, numpy.random.randint(0, 31, size=n))
result = pandas.DataFrame(r, columns=['F'], index=k)
result.index.name = 'key'
return result
1 In this example, these keys happen to correspond to the binary representation of some integer between 0 and 31, inclusive, but this fact plays no role in the question.
Option 1
groupby
base_df.groupby(level=[0, 1]).apply(fn)
F
I S key
0 a 11010 0.385245
00010 0.890244
00101 0.040484
b 01001 0.569204
11011 0.802265
00100 0.063107
1 a 00100 0.947827
00100 0.056551
11000 0.084872
b 11110 0.592641
00110 0.130423
11101 0.915945
Option 2
pd.concat
pd.concat({t.Index: fn(t.seed) for t in base_df.itertuples()})
F
key
0 a 11011 0.592845
00011 0.844266
b 00101 0.997185
01111 0.932557
00000 0.128124
1 a 01011 0.185082
10010 0.931541
b 10011 0.070725
01010 0.839949
01011 0.121329
11001 0.569311
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With