Returning multiple values from pandas apply on a DataFrame

Tags:

pandas

I'm using a Pandas DataFrame to do a row-wise t-test as per this example:

import numpy import pandas  df = pandas.DataFrame(numpy.log2(numpy.randn(1000, 4),                        columns=["a", "b", "c", "d"])  df = df.dropna()

Now, supposing I have "a" and "b" as one group, and "c" and "d" at the other, I'm performing the t-test row-wise. This is fairly trivial with pandas, using apply with axis=1. However, I can either return a DataFrame of the same shape if my function doesn't aggregate, or a Series if it aggregates.

Normally I would just output the p-value (so, aggregation) but I would like to generate an additional value based on other calculations (in other words, return two values). I can of course do two runs, aggregating the p-values first, then doing the other work, but I was wondering if there is a more efficient way to do so as the data is reasonably large.

As an example of the calculation, a hypotethical function would be:

from scipy.stats import ttest_ind  def t_test_and_mean(series, first, second):     first_group = series[first]     second_group = series[second]     _, pvalue = ttest_ind(first_group, second_group)      mean_ratio = second_group.mean() / first_group.mean()      return (pvalue, mean_ratio)

Then invoked with

df.apply(t_test_and_mean, first=["a", "b"], second=["c", "d"], axis=1)

Of course in this case it returns a single Series with the two tuples as value.

Instead, ny expected output would be a DataFrame with two columns, one for the first result, and one for the second. Is this possible or I have to do two runs for the two calculations, then merge them together?

305

asked May 25 '12 08:05

Einar

1 Answers

Returning a Series, rather than tuple, should produce a new multi-column DataFrame. For example,

return pandas.Series({'pvalue': pvalue, 'mean_ratio': mean_ratio})

147

answered Sep 18 '22 15:09

Garrett

Related questions
                            
                                Selenium versus BeautifulSoup for web scraping
                            
                                for or while loop to do something n times
                            
                                How to get the current Python interpreter path from inside a Python script? [duplicate]
                            
                                Should a return statement have parentheses?
                            
                                Scikit-learn's LabelBinarizer vs. OneHotEncoder
                            
                                Does the SVM in sklearn support incremental (online) learning?
                            
                                SQLite Performance Benchmark -- why is :memory: so slow...only 1.5X as fast as disk?
                            
                                Computing diffs within groups of a dataframe
                            
                                Custom loss function in Keras
                            
                                Python: next() function
                            
                                Resource usage of google Go vs Python and Java on Appengine
                            
                                Time Series Decomposition function in Python
                            
                                Global error handler for any exception
                            
                                What is the difference between __init__.py and __main__.py? [duplicate]
                            
                                Is there an R equivalent of the pythonic "if __name__ == "__main__": main()"?
                            
                                Python: How to show matplotlib in flask [duplicate]
                            
                                Using Numpy Vectorize on Functions that Return Vectors
                            
                                Why is variable1 += variable2 much faster than variable1 = variable1 + variable2?
                            
                                How to rearrange array based upon index array
                            
                                Using Merge on a column and Index in Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With