Python equivalent for do.call(rbind, lapply()) from R

Question

One of my main tools in my workflows is the do.call(rbind, lapply()) as exampled here in R:

df1 <- data.frame(x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10))
df2 <- data.frame(x1 = rnorm(10, 5), x2 = rnorm(10), x3 = rnorm(10))

getp <- function(var) {
  return(t.test(df1[, var], df2[, var])$p.value)
}

list <- c('x1', 'x2', 'x3')
ps <- do.call(rbind, lapply(list, getp))
ps
                 [,1]
[1,] 6.232025e-09
[2,] 2.128019e-09
[3,] 5.824713e-08

This creates a nice column of p-values. In the real world I would pull out a one row data.frame with each column having useful model stats. With the goal being to iterate over many columns with the same model type and see the fit/effects.

In python, I can create a similar function:

from statsmodels.stats.weightstats import ttest_ind 
import numpy as np
import pandas as pd

df1 = pd.DataFrame({'x1' : np.random.randn(10), 'x2' : np.random.randn(10), 'x3' : np.random.randn(10)}) 
df2 = pd.DataFrame({'x1' : np.random.randn(10)+5, 'x2' : np.random.randn(10)+5, 'x3' : np.random.randn(10)+5}) 
def getp(var):
    print(ttest_ind(df1[var], df2[var])[1])

vars = ['x1', 'x2', 'x3']

I can print all pvalues to the console via:

for i in vars:
    getp(i)

9.67944232638e-08
1.82163637251e-08
2.00410346438e-10

But I'd like to save this as an object as one column with three rows similar to in R. Is this possible?

Thanks!

The actual function may look something like this:

def getMoreThanP(var):
    out = pd.DataFrame({'mean1' : [np.mean(df1[var])], 'mean2' : [np.mean(df2[var])], 'pvalue' : [ttest_ind(df1[var], df2[var])[1]]})
    print(out)

for i in vars:
    getMoreThanP(i)

...     getMoreThanP(i)
     mean1     mean2        pvalue
0  0.24452  4.824327  2.438985e-11
      mean1     mean2        pvalue
0  0.187176  4.969862  1.115546e-11
      mean1     mean2        pvalue
0  0.035759  5.249378  1.525264e-08

ayhan · Accepted Answer

Instead of passing variables one by one, you can pass all three:

ttest_ind(df1[vars], df2[vars])[1]
Out[85]: array([  4.97835813e-11,   8.30544748e-08,   9.24917262e-07])

The returning object is a one-dimensional array. If you want a dataframe instead

pd.DataFrame(ttest_ind(df1[vars], df2[vars])[1])

This is mainly because ttest_ind accepts array like objects. For getMoreThanP, you can use a combination of pd.concat and map:

def getMoreThanP(var):
    out = pd.DataFrame({'mean1' : [np.mean(df1[var])], 'mean2' : [np.mean(df2[var])], 'pvalue' : [ttest_ind(df1[var], df2[var])[1]]})
    return out

pd.concat(map(getMoreThanP, vars))
# pd.concat(map(getMoreThanP, vars), ignore_index=True) if you want to reset index
Out[134]: 
      mean1     mean2        pvalue
0 -0.021791  4.964985  4.978358e-11
0  0.087019  4.610332  8.305447e-08
0 -0.084168  4.680124  9.249173e-07

Note that I changed the definition of getMoreThanP to return the dataframe instead of printing it.

Python equivalent for do.call(rbind, lapply()) from R

Tags:

python

pandas

r

lapply

Andrew Taylor

1 Answers

ayhan

Recent Activity

Donate For Us

Python equivalent for do.call(rbind, lapply()) from R

Tags:

python

pandas

r

lapply

Andrew Taylor

1 Answers

ayhan

Related questions

Recent Activity

Donate For Us