I have a dataframe with a timeindex and 3 columns containing the coordinates of a 3D vector:
x y z ts 2014-05-15 10:38 0.120117 0.987305 0.116211 2014-05-15 10:39 0.117188 0.984375 0.122070 2014-05-15 10:40 0.119141 0.987305 0.119141 2014-05-15 10:41 0.116211 0.984375 0.120117 2014-05-15 10:42 0.119141 0.983398 0.118164
I would like to apply a transformation to each row that also returns a vector
def myfunc(a, b, c): do something return e, f, g
but if I do:
df.apply(myfunc, axis=1)
I end up with a Pandas series whose elements are tuples. This is beacause apply will take the result of myfunc without unpacking it. How can I change myfunc so that I obtain a new df with 3 columns?
Edit:
All solutions below work. The Series solution does allow for column names, the List solution seem to execute faster.
def myfunc1(args): e=args[0] + 2*args[1] f=args[1]*args[2] +1 g=args[2] + args[0] * args[1] return pd.Series([e,f,g], index=['a', 'b', 'c']) def myfunc2(args): e=args[0] + 2*args[1] f=args[1]*args[2] +1 g=args[2] + args[0] * args[1] return [e,f,g] %timeit df.apply(myfunc1 ,axis=1) 100 loops, best of 3: 4.51 ms per loop %timeit df.apply(myfunc2 ,axis=1) 100 loops, best of 3: 2.75 ms per loop
Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.
We can also add multiple rows using the pandas. concat() by creating a new dataframe of all the rows that we need to add and then appending this dataframe to the original dataframe.
We will select multiple rows in pandas using multiple conditions, logical operators and using loc() function. Selecting rows with logical operators i.e. AND and OR can be achieved easily with a combination of >, <, <=, >= and == to extract rows with multiple filters.
Return Series
and it will put them in a DataFrame.
def myfunc(a, b, c): do something return pd.Series([e, f, g])
This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.
Based on the excellent answer by @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.
def apply_and_concat(dataframe, field, func, column_names): return pd.concat(( dataframe, dataframe[field].apply( lambda cell: pd.Series(func(cell), index=column_names))), axis=1)
Usage:
df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A']) print df A a 1 b 2 c 3 def func(x): return x*x, x*x*x print apply_and_concat(df, 'A', func, ['x^2', 'x^3']) A x^2 x^3 a 1 1 1 b 2 4 8 c 3 9 27
Hope it helps someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With