Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning multiple values from pandas apply on a DataFrame

Tags:

python

pandas

I'm using a Pandas DataFrame to do a row-wise t-test as per this example:

import numpy import pandas  df = pandas.DataFrame(numpy.log2(numpy.randn(1000, 4),                        columns=["a", "b", "c", "d"])  df = df.dropna() 

Now, supposing I have "a" and "b" as one group, and "c" and "d" at the other, I'm performing the t-test row-wise. This is fairly trivial with pandas, using apply with axis=1. However, I can either return a DataFrame of the same shape if my function doesn't aggregate, or a Series if it aggregates.

Normally I would just output the p-value (so, aggregation) but I would like to generate an additional value based on other calculations (in other words, return two values). I can of course do two runs, aggregating the p-values first, then doing the other work, but I was wondering if there is a more efficient way to do so as the data is reasonably large.

As an example of the calculation, a hypotethical function would be:

from scipy.stats import ttest_ind  def t_test_and_mean(series, first, second):     first_group = series[first]     second_group = series[second]     _, pvalue = ttest_ind(first_group, second_group)      mean_ratio = second_group.mean() / first_group.mean()      return (pvalue, mean_ratio) 

Then invoked with

df.apply(t_test_and_mean, first=["a", "b"], second=["c", "d"], axis=1) 

Of course in this case it returns a single Series with the two tuples as value.

Instead, ny expected output would be a DataFrame with two columns, one for the first result, and one for the second. Is this possible or I have to do two runs for the two calculations, then merge them together?

like image 305
Einar Avatar asked May 25 '12 08:05

Einar


People also ask

How can I return multiple values from pandas?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

How do I return two columns in pandas DataFrame?

Return multiple columns using Pandas apply() method Objects passed to the pandas. apply() are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function.

How do I apply a function to multiple columns in pandas?

Pandas apply() Function to Single & Multiple Column(s) Using pandas. DataFrame. apply() method you can execute a function to a single column, all and list of multiple columns (two or more).

Is apply faster than for loop Python?

apply is not faster in itself but it has advantages when used in combination with DataFrames. This depends on the content of the apply expression. If it can be executed in Cython space, apply is much faster (which is the case here). We can use apply with a Lambda function.


1 Answers

Returning a Series, rather than tuple, should produce a new multi-column DataFrame. For example,

return pandas.Series({'pvalue': pvalue, 'mean_ratio': mean_ratio}) 
like image 147
Garrett Avatar answered Sep 18 '22 15:09

Garrett