I'm using a Pandas DataFrame
to do a row-wise t-test as per this example:
import numpy import pandas df = pandas.DataFrame(numpy.log2(numpy.randn(1000, 4), columns=["a", "b", "c", "d"]) df = df.dropna()
Now, supposing I have "a" and "b" as one group, and "c" and "d" at the other, I'm performing the t-test row-wise. This is fairly trivial with pandas, using apply
with axis=1. However, I can either return a DataFrame of the same shape if my function doesn't aggregate, or a Series if it aggregates.
Normally I would just output the p-value (so, aggregation) but I would like to generate an additional value based on other calculations (in other words, return two values). I can of course do two runs, aggregating the p-values first, then doing the other work, but I was wondering if there is a more efficient way to do so as the data is reasonably large.
As an example of the calculation, a hypotethical function would be:
from scipy.stats import ttest_ind def t_test_and_mean(series, first, second): first_group = series[first] second_group = series[second] _, pvalue = ttest_ind(first_group, second_group) mean_ratio = second_group.mean() / first_group.mean() return (pvalue, mean_ratio)
Then invoked with
df.apply(t_test_and_mean, first=["a", "b"], second=["c", "d"], axis=1)
Of course in this case it returns a single Series with the two tuples as value.
Instead, ny expected output would be a DataFrame with two columns, one for the first result, and one for the second. Is this possible or I have to do two runs for the two calculations, then merge them together?
Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.
Return multiple columns using Pandas apply() method Objects passed to the pandas. apply() are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function.
Pandas apply() Function to Single & Multiple Column(s) Using pandas. DataFrame. apply() method you can execute a function to a single column, all and list of multiple columns (two or more).
apply is not faster in itself but it has advantages when used in combination with DataFrames. This depends on the content of the apply expression. If it can be executed in Cython space, apply is much faster (which is the case here). We can use apply with a Lambda function.
Returning a Series, rather than tuple, should produce a new multi-column DataFrame. For example,
return pandas.Series({'pvalue': pvalue, 'mean_ratio': mean_ratio})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With