Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Applying dataframe-returning function to every row of base dataframe

Toy example

Suppose that base_df is the tiny dataframe shown below:

In [221]: base_df
Out[221]: 
     seed
I S      
0 a     0
  b     1
1 a     2
  b     3

Note that base_df has a 2-level multi-index for the rows. (Part of the problem here involves "propagating" this multi-index's values in a derived dataframe.)

Now, the function fn (definition given at the end of this post) takes an integer seed as argument and returns a 1-column dataframe indexed by string keys1. For example:

In [222]: fn(0)
Out[222]: 
              F
key            
01011  0.592845
10100  0.844266

In [223]: fn(1)
Out[223]: 
              F
key            
11110  0.997185
01000  0.932557
11100  0.128124

I want to generate a new dataframe, in essence, by applying fn to every row of base_df, and concatenating the resulting dataframes vertically. More specifically, the desired result would look like this:

                  F
I S key            
0 a 01011  0.592845
    10100  0.844266
  b 11110  0.997185
    01000  0.932557
    11100  0.128124
1 a 01101  0.185082
    01110  0.931541
  b 00100  0.070725
    11011  0.839949
    11111  0.121329
    11000  0.569311

IOW, conceptually, the desired dataframe is obtained by generating one "sub-dataframe" for each row of base_df, and concatenating these sub-dataframes vertically. The sub-dataframe corresponding to each row has a 3-level multi-index. The first two levels (I and S) of this multi-index come from base_df's multi-index value for that row, while its last level (key), as well as the values for the (lone) F column come from the dataframe returned by fn for that row's seed value.

The part I'm not clear on is how to propagate the row's original multi-index value to the rows of the dataframe created by fn for that row's seed value.

IMPORTANT: I'm looking for a way to do this that is agnostic to the names of the base_df's multi-index's levels, and their number.


I tried the following

base_df.apply(lambda row: fn(row.seed), axis=1)

...but the evaluation fails with the error

ValueError: Shape of passed values is (4, 2), indices imply (4, 1)

Is there some convenient way to do what I'm trying to do?


Here's the definition of fn. Its internals are unimportant as far as this question is concerned. What matters is that it takes an integer seed as argument, and returns a dataframe, as described earlier.

import numpy
def fn(seed, _spec='{{0:0{0:d}b}}'.format(5)):
    numpy.random.seed(int(seed))
    n = numpy.random.randint(2, 5)
    r = numpy.random.rand(n)
    k = map(_spec.format, numpy.random.randint(0, 31, size=n))
    result = pandas.DataFrame(r, columns=['F'], index=k)
    result.index.name = 'key'
    return result

1 In this example, these keys happen to correspond to the binary representation of some integer between 0 and 31, inclusive, but this fact plays no role in the question.

like image 347
kjo Avatar asked Feb 11 '26 12:02

kjo


1 Answers

Option 1
groupby

base_df.groupby(level=[0, 1]).apply(fn)

                  F
I S key            
0 a 11010  0.385245
    00010  0.890244
    00101  0.040484
  b 01001  0.569204
    11011  0.802265
    00100  0.063107
1 a 00100  0.947827
    00100  0.056551
    11000  0.084872
  b 11110  0.592641
    00110  0.130423
    11101  0.915945

Option 2
pd.concat

pd.concat({t.Index: fn(t.seed) for t in base_df.itertuples()})

                  F
    key            
0 a 11011  0.592845
    00011  0.844266
  b 00101  0.997185
    01111  0.932557
    00000  0.128124
1 a 01011  0.185082
    10010  0.931541
  b 10011  0.070725
    01010  0.839949
    01011  0.121329
    11001  0.569311
like image 150
piRSquared Avatar answered Feb 15 '26 00:02

piRSquared



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!