Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas apply function that returns multiple values to rows in pandas dataframe

I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:

                         x             y             z ts 2014-05-15 10:38         0.120117      0.987305      0.116211 2014-05-15 10:39         0.117188      0.984375      0.122070 2014-05-15 10:40         0.119141      0.987305      0.119141 2014-05-15 10:41         0.116211      0.984375      0.120117 2014-05-15 10:42         0.119141      0.983398      0.118164 

I would like to apply a transformation to each row that also returns a vector

def myfunc(a, b, c):     do something     return e, f, g 

but if I do:

df.apply(myfunc, axis=1) 

I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?

Edit:

All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.

def myfunc1(args):     e=args[0] + 2*args[1]     f=args[1]*args[2] +1     g=args[2] + args[0] * args[1]     return pd.Series([e,f,g], index=['a', 'b', 'c'])  def myfunc2(args):     e=args[0] + 2*args[1]     f=args[1]*args[2] +1     g=args[2] + args[0] * args[1]     return [e,f,g]  %timeit df.apply(myfunc1 ,axis=1)  100 loops, best of 3: 4.51 ms per loop  %timeit df.apply(myfunc2 ,axis=1)  100 loops, best of 3: 2.75 ms per loop 
like image 550
Fra Avatar asked May 15 '14 23:05

Fra


People also ask

How can I return multiple values from pandas?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

Which method in pandas can be used to add multiple rows to a DataFrame?

We can also add multiple rows using the pandas. concat() by creating a new dataframe of all the rows that we need to add and then appending this dataframe to the original dataframe.

How do I get multiple rows in a DataFrame?

We will select multiple rows in pandas using multiple conditions, logical operators and using loc() function. Selecting rows with logical operators i.e. AND and OR can be achieved easily with a combination of >, <, <=, >= and == to extract rows with multiple filters.


2 Answers

Return Series and it will put them in a DataFrame.

def myfunc(a, b, c):     do something     return pd.Series([e, f, g]) 

This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.

like image 112
U2EF1 Avatar answered Oct 26 '22 03:10

U2EF1


Based on the excellent answer by @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.

def apply_and_concat(dataframe, field, func, column_names):     return pd.concat((         dataframe,         dataframe[field].apply(             lambda cell: pd.Series(func(cell), index=column_names))), axis=1) 

Usage:

df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A']) print df    A a  1 b  2 c  3  def func(x):     return x*x, x*x*x  print apply_and_concat(df, 'A', func, ['x^2', 'x^3'])     A  x^2  x^3 a  1    1    1 b  2    4    8 c  3    9   27 

Hope it helps someone.

like image 43
Dennis Golomazov Avatar answered Oct 26 '22 01:10

Dennis Golomazov