Apply function to each row in Pandas dataframe by group

Question

I built a Pandas dataframe (example below) indexed by gene name that has sample names for columns and integers as cell values. What I want to do is run an ANOVA (f_oneway(), from scipy.stats) for lists of row values as defined by lists of the columns corresponding to groups of samples. Those results would then be stored in a new Pandas dataframe with group names as columns and the same genes for index.

An example of the dataframe (it's returned from another function in my ):

import pandas as pd
counts = {'sample1' : [0, 1, 5, 0, 10],
        'sample2' : [2, 0, 10, 0, 0],
        'sample3' : [0, 0, 0, 1, 0],
        'sample4' : [10, 0, 1, 4, 0]}
data = pd.DataFrame(counts, columns = ['sample1', 'sample2', 'sample3', 'sample4'],
        index = ['gene1', 'gene2', 'gene3', 'gene4', 'gene5'])

Groups are imported as arguments by main(), so in this function I have:

def compare(out_prefix, pops, data):
    import scipy.stats as stats
    sig = pd.DataFrame(index=data.index)

#groups will look like:
#groups = [['sample1', 'sample2'],['sample3', 'sample4']]

    for group in groups:
        with open(group) as infile:
            groups_s = []
            for spl in infile:
                group_s.append(spl.replace("
",""))

        mean_col = pop.split(".")[0]+"_mean"
        std_col = pop.split(".")[0]+"_std"
        stat_col = pop.split(".")[0]+"_stat"
        p_col = pop.split(".")[0]+"_sig"

        sig[mean_col] = data[group_s].mean(axis=1)
        sig[std_col] = data[group_s].std(axis=1)

        sig[[stat_col, p_col]] = data.apply(lambda row : stats.f_oneway(data.loc[group_s].values.tolist()))

This last line doesn't work and I can't see how it could be done from some googling in the last few days - could someone point me in the right direction? Ideally, it would write the results of the ANOVA test (power, significance) per row for the samples in each group by group to columns stat_col and p_col in sig. For gene1 it would feed stats.f_oneway a list of lists of the values for samples in each group e.g. [[0,2],[0, 10]].

Thanks in advance!

dokteurwho · Accepted Answer

Try this:

group = ['sample1', 'sample2']

On your sample:

data[group].T

looks likes:

    gene1   gene2   gene3   gene4   gene5
sample1     0   1   5   0   10
sample2     2   0   10  0   0

and finally:

anova = stats.f_oneway(*data[group].T.values)
print(anova.statistic, anova.pvalue)

anova object contains what you expect:

0.0853333333333 0.777628169862

Apply function to each row in Pandas dataframe by group

Tags:

python

pandas

dataframe

scipy

anova

André Soares

1 Answers

dokteurwho

Recent Activity

Donate For Us

Apply function to each row in Pandas dataframe by group

Tags:

python

pandas

dataframe

scipy

anova

André Soares

1 Answers

dokteurwho

Related questions

Recent Activity

Donate For Us