Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas apply function to multiple columns and multiple rows

Tags:

python

pandas

I have a dataframe with consecutive pixel coordinates in rows and columns 'xpos', 'ypos', and I want to calculate the angle in degrees of each path between consecutive pixels. Currently I have the solution presented below, which works fine and for teh size of my file is speedy enough, but iterating through all the rows seems not to be the pandas way to do it. I know how to apply a function to different columns, and how to apply functions to different rows of columns, but can't figure out how to combine both.

here's my code:

fix_df = pd.read_csv('fixations_out.csv')

# wyliczanie kąta sakady
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)

and then I insert temp list into df

EDIT: after implementing the tip from the comment I have:

df['diff_x'] = df['xpos'].shift() - df['xpos']
df['diff_y'] = df['ypos'].shift() - df['ypos']

def calc_angle(x):
    try:
        a = abs(180/math.pi * math.atan((x.diff_y)/(x.diff_x)))
        return a
    except ZeroDivisionError:
        return 0

df['angle_degrees'] = df.apply(calc_angle, axis=1)

I compared the time of three solutions for my df (the size of the df is about 6k rows), the iteration is almost 9 times slower than apply, and about 1500 times slower then doing it without apply:

execution time of the solution with iteration, including insert of a new column back to df: 1,51s

execution time of the solution without iteration, with apply: 0.17s

execution time of accepted answer by EdChum using diff(), without iteration and without apply: 0.001s

Suggestion: do not use iteration or apply and always try to use vectorized calculation ;) it is not only faster, but also more readable.

like image 864
yemu Avatar asked Jun 13 '14 09:06

yemu


People also ask

How do I apply a lambda function to all columns in pandas?

We can do this with the apply() function in Pandas. We can use the apply() function to apply the lambda function to both rows and columns of a dataframe. If the axis argument in the apply() function is 0, then the lambda function gets applied to each column, and if 1, then the function gets applied to each row.

Is pandas apply faster than Iterrows?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

What is the difference between apply and Applymap in pandas?

apply() is used to apply a function along an axis of the DataFrame or on values of Series. applymap() is used to apply a function to a DataFrame elementwise.


1 Answers

You can do this via the following method and I compared the pandas way to your way and it is over 1000 times faster, and that is without adding the list back as a new column! This was done on a 10000 row dataframe

In [108]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].shift() - df['xpos']/df['ypos'].shift() - df['ypos']))

1000 loops, best of 3: 1.27 ms per loop

In [100]:

%%timeit
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)
1 loops, best of 3: 1.29 s per loop

Also if possible avoid using apply, as this operates row-wise, if you can find a vectorised method that can work on the entire series or dataframe then always prefer this.

UPDATE

seeing as you are just doing a subtraction from the previous row there is built in method for this diff this results in even faster code:

In [117]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1)/df['ypos'].diff(1)))

1000 loops, best of 3: 1.01 ms per loop

Another update

There is also a build in method for series and dataframe division, this now shaves more time off and I achieve sub 1ms time:

In [9]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1).div(df['ypos'].diff(1))))

1000 loops, best of 3: 951 µs per loop
like image 184
EdChum Avatar answered Oct 26 '22 04:10

EdChum