Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function row wise to pandas dataframe

I have to calculate the distance on a hilbert-curve from 2D-Coordinates. With the hilbertcurve-package i built my own "hilbert"-function, to do so. The coordinates are stored in a dataframe (col_1 and col_2). As you see, my function works when applied to two values (test).

However it just does not work when applied row wise via apply-function! Why is this? what am I doing wrong here? I need an additional column "hilbert" with the hilbert-distances from the x- and y-coordinate given in columns "col_1" and "col_2".

import pandas as pd
from hilbertcurve.hilbertcurve import HilbertCurve

df = pd.DataFrame({'ID': ['1', '2', '3'],
                   'col_1': [0, 2, 3],
                   'col_2': [1, 4, 5]})


def hilbert(x, y):
    n = 2
    p = 7
    hilcur = HilbertCurve(p, n)
    dist = hilcur.distance_from_coordinates([x, y])
    return dist


test = hilbert(df.col_1[2], df.col_2[2])

df["hilbert"] = df.apply(hilbert(df.col_1, df.col_2), axis=0)

The last command ends in error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Thank you for your help!

like image 987
Scrabyard Avatar asked Dec 31 '22 01:12

Scrabyard


2 Answers

Since you have hilbert(df.col_1, df.col_2) in the apply, that's immediately trying to call your function with the full pd.Serieses for those two columns, triggering that error. What you should be doing is:

df.apply(lambda x: hilbert(x['col_1'], x['col_2']), axis=1)

so that the lambda function given will be applied to each row.

like image 153
Randy Avatar answered Jan 11 '23 21:01

Randy


You have to define your axis as 1, because you want to apply your function on the rows, not the columns.

You can define a lambda function to apply the hilbert only for the two rows like this:

df['hilbert'] = df.apply(lambda row: hilbert(row['col_1'], row['col_2']), axis=1)
like image 24
Athina Barbul Avatar answered Jan 11 '23 20:01

Athina Barbul