pandas apply function on multiindex

Tags:

I would like to apply a function on a multiindex dataframe (basically groupby describe dataframe) without using for loop to traverse level 0 index.

Function I'd like to apply:

def CI(x):
    import math
    sigma = x["std"]
    n = x["count"]
    return 1.96 * sigma / math.sqrt(n)

Sample of my dataframe:

df = df.iloc[47:52, [3,4,-1]]

               a          b                    id
47          0.218182   0.000000  0d1974107c6731989c762e96def73568
48          0.000000   0.000000  0d1974107c6731989c762e96def73568
49          0.218182   0.130909  0d1974107c6731989c762e96def73568
50          0.000000   0.000000  0fd4f3b4adf43682f08e693a905b7432
51          0.000000   0.000000  0fd4f3b4adf43682f08e693a905b7432

And I replace zeros with nan:

df = df.replace(float(0), np.nan)

Groupy on id and describe and I get multiindex:

df_group = df.groupby("id").describe()

Current solution I don't like and think could be improved:

l_df = []
for column in df_group.columns.levels[0]:
    df = pd.DataFrame({"CI" : df_group[column].apply(CI, axis = 1)})
    l_df.append(df)
CI = pd.concat(l_df, axis = 1)
CI.columns = df_group.columns.levels[0]

so I get something like:

                                    a       b
id
06f32e6e45da385834dac983256d59f3    nan     nan
0d1974107c6731989c762e96def73568    0.005   0.225
0fd4f3b4adf43682f08e693a905b7432    0.008   nan
11e0057cdc8b8e1b1cdabfa8a092ea5f    0.018   0.582
120549af6977623bd01d77135a91a523    0.008   0.204

So again, if I have top level columns from a to z, and each contains std and count column, how can I apply my function to each of these columns at the same time?

245

asked Sep 07 '17 13:09

LostBoardOnTaurangaBeach

1 Answers

Using groupby on level with axis=1, let's you iterate and apply over the first level columns.

In [104]: (df.groupby("id").describe()
             .groupby(level=0, axis=1)
             .apply(lambda x: x[x.name].apply(CI, axis=1)))
Out[104]:
                                    a   b
id
0d1974107c6731989c762e96def73568  0.0 NaN
0fd4f3b4adf43682f08e693a905b7432  NaN NaN

Infact, you don't need CI, if you were to

In [105]: (df.groupby("id").describe()
             .groupby(level=0, axis=1).apply(lambda x: x[x.name]
             .apply(lambda x: 1.96*x['std']/np.sqrt(x['count']), axis=1)))
Out[105]:
                                    a   b
id
0d1974107c6731989c762e96def73568  0.0 NaN
0fd4f3b4adf43682f08e693a905b7432  NaN NaN

Sample df

In [106]: df
Out[106]:
           a         b                                id
47  0.218182       NaN  0d1974107c6731989c762e96def73568
48       NaN       NaN  0d1974107c6731989c762e96def73568
49  0.218182  0.130909  0d1974107c6731989c762e96def73568
50       NaN       NaN  0fd4f3b4adf43682f08e693a905b7432
51       NaN       NaN  0fd4f3b4adf43682f08e693a905b7432

125

answered Nov 01 '22 20:11

Zero

Related questions
                            
                                Python requests - 403 forbidden - despite setting `User-Agent` headers
                            
                                Best way to access class-method into instance method
                            
                                Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column
                            
                                Behavior of np.c_ with list and tuple arguments
                            
                                What is the fastest way to plot coordinates on map inline (Jupyter)?
                            
                                TypeError: 'KeysView' object does not support indexing
                            
                                python ternary if statement does not catch None
                            
                                How to set length for python Faker fields
                            
                                Print version of a module without importing the entire package
                            
                                What's the functional difference between `etree.fromstring()` and `etree.XML()` in lxml?
                            
                                Understanding __call__ with metaclasses [duplicate]
                            
                                AttributeError: 'Series' object has no attribute 'rolling'
                            
                                Add months to date column in Spark dataframe
                            
                                Replacing only the captured group using re.sub and multiple replacements
                            
                                tensorflow object detection Fine-tuning a model from an existing checkpoint
                            
                                Updating z data on a surface_plot in Matplotlib animation
                            
                                Should super always be at the top of an __init__ method, or can it be at the bottom?
                            
                                How do I get the number of likes on a tweet via tweepy?
                            
                                Conda build unsatisfiable dependencies error with pint
                            
                                Why do I get warning "QStandardPaths: XDG_RUNTIME_DIR not set" every time for a PyQt5 project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas apply function on multiindex

Tags:

python

pandas

multi-index

LostBoardOnTaurangaBeach

People also ask

1 Answers

Zero

Recent Activity

Donate For Us