Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas apply convolve by group of rows

I have a pandas dataframe like below :

   profile_index  point_index         z              x             y
0              0            1 -0.885429  297903.323027  6.669492e+06
1              0            2 -0.820151  297904.117752  6.669492e+06
2              0            3 -0.729671  297904.912476  6.669491e+06
3              0            4 -0.649332  297905.707201  6.669490e+06
4              1            1 -0.692186  297906.501926  6.669490e+06
5              1            2 -0.885429  297903.323027  6.669492e+06
6              1            3 -0.820151  297904.117752  6.669492e+06
3              1            4 -0.649332  297905.707201  6.669490e+06

I want to create a new "z_gauss" column by applying a convolution (numpy.convolve) with a gaussian filter on vectors (column z) corresponding to a group of rows in my dataframe with the same "profile_index".

I've tried to do something like

data["z_gauss"] = data.groupby('profile_index').apply(lambda x: np.convolve(x, gaussian, 'same'))

where gaussian is my gaussian filter (vector). But I get some errors like ValueError: object too deep for desired array

Do you have any advices/hints on how to proceed ? Should I split my dataframe into different ones ?

like image 984
Beinje Avatar asked Dec 03 '19 17:12

Beinje


People also ask

Is Iterrows faster than apply?

By using apply and specifying one as the axis, we can run a function on every row of a dataframe. This solution also uses looping to get the job done, but apply has been optimized better than iterrows , which results in faster runtimes.

How do you use Groupby in pandas?

The Hello, World! of pandas GroupBy You call . groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to .

How do I get Groupby columns in pandas?

You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.


1 Answers

You want to use a transform instead of an apply. This will avoid inserting a vector wth the size of the group per row:

data["z_gauss"] = (data.groupby('profile_index')['z']
                   .transform(lambda x: np.convolve(x, gaussian, 'same')))
like image 85
Brandon Avatar answered Oct 05 '22 01:10

Brandon