Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I calculate a pandas column with multiple columns as arguments?

Tags:

python

pandas

I was using a wind speed calculation function from lon and lat components:

def wind_speed(u, v):
    return np.sqrt(u ** 2 + v ** 2)

and calling it to calculate a new pandas column from two existing ones:

df['wspeed'] = map(wind_speed, df['lonwind'], df['latwind'])

Since I changed from Python 2.7 to Python 3.5 the function is not working anymore. Could the change be the cause?

In a single argument (column) function:

def celsius(T):
    return round(T - 273, 1)

I am now using:

df['temp'] = df['t2m'].map(celsius)

And it works fine.

Could you help me?

like image 806
Hugo Avatar asked Sep 18 '25 17:09

Hugo


2 Answers

If want to use map, add list:

df = pd.DataFrame({'lonwind':[1,2,3],
                   'latwind':[4,5,6]})

print (df)
   latwind  lonwind
0        4        1
1        5        2
2        6        3

def wind_speed(u, v):
    return np.sqrt(u ** 2 + v ** 2)

df['wspeed'] = list(map(wind_speed, df['lonwind'], df['latwind']))

print (df)
   latwind  lonwind    wspeed
0        4        1  4.123106
1        5        2  5.385165
2        6        3  6.708204

Without list:

df['wspeed'] = (map(wind_speed, df['lonwind'], df['latwind']))
print (df)
   latwind  lonwind                              wspeed
0        4        1  <map object at 0x000000000AC42DA0>
1        5        2  <map object at 0x000000000AC42DA0>
2        6        3  <map object at 0x000000000AC42DA0>

map(function, iterable, ...)

Return an iterator that applies function to every item of iterable, yielding the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see itertools.starmap().

Another solution:

df['wspeed'] = (df['lonwind'] ** 2 + df['latwind'] ** 2) **0.5
print (df)
   latwind  lonwind    wspeed
0        4        1  4.123106
1        5        2  5.385165
2        6        3  6.708204
like image 180
jezrael Avatar answered Sep 20 '25 07:09

jezrael


I would try to stick to existing numpy/scipy functions as they are extremely fast and optimized (numpy.hypot):

df['wspeed'] = np.hypot(df.latwind, df.lonwind)

Timing: against 300K rows DF:

In [47]: df = pd.concat([df] * 10**5, ignore_index=True)

In [48]: df.shape
Out[48]: (300000, 2)

In [49]: %paste
def wind_speed(u, v):
    return np.sqrt(u ** 2 + v ** 2)

## -- End pasted text --

In [50]: %timeit list(map(wind_speed, df['lonwind'], df['latwind']))
1 loop, best of 3: 922 ms per loop

In [51]: %timeit np.hypot(df.latwind, df.lonwind)
100 loops, best of 3: 4.08 ms per loop

Conclusion: vectorized approach was 230 times faster

If you have to write your own one, try to use vectorized math (working with vectors / columns instead of scalars):

def wind_speed(u, v):
    # using vectorized approach - column's math instead of scalar 
    return np.sqrt(u * u + v * v)

df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])

demo:

In [39]: df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])

In [40]: df
Out[40]:
   latwind  lonwind    wspeed
0        4        1  4.123106
1        5        2  5.385165
2        6        3  6.708204

same vectorized approach with celsius() function:

def celsius(T):
    # using vectorized function: np.round()
    return np.round(T - 273, 1)
like image 44
MaxU - stop WAR against UA Avatar answered Sep 20 '25 05:09

MaxU - stop WAR against UA