I was using a wind speed calculation function from lon and lat components:
def wind_speed(u, v):
return np.sqrt(u ** 2 + v ** 2)
and calling it to calculate a new pandas column from two existing ones:
df['wspeed'] = map(wind_speed, df['lonwind'], df['latwind'])
Since I changed from Python 2.7 to Python 3.5 the function is not working anymore. Could the change be the cause?
In a single argument (column) function:
def celsius(T):
return round(T - 273, 1)
I am now using:
df['temp'] = df['t2m'].map(celsius)
And it works fine.
Could you help me?
If want to use map
, add list
:
df = pd.DataFrame({'lonwind':[1,2,3],
'latwind':[4,5,6]})
print (df)
latwind lonwind
0 4 1
1 5 2
2 6 3
def wind_speed(u, v):
return np.sqrt(u ** 2 + v ** 2)
df['wspeed'] = list(map(wind_speed, df['lonwind'], df['latwind']))
print (df)
latwind lonwind wspeed
0 4 1 4.123106
1 5 2 5.385165
2 6 3 6.708204
Without list
:
df['wspeed'] = (map(wind_speed, df['lonwind'], df['latwind']))
print (df)
latwind lonwind wspeed
0 4 1 <map object at 0x000000000AC42DA0>
1 5 2 <map object at 0x000000000AC42DA0>
2 6 3 <map object at 0x000000000AC42DA0>
map(function, iterable, ...)
Return an iterator that applies function to every item of iterable, yielding the results. If additional iterable arguments are passed, function must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see itertools.starmap().
Another solution:
df['wspeed'] = (df['lonwind'] ** 2 + df['latwind'] ** 2) **0.5
print (df)
latwind lonwind wspeed
0 4 1 4.123106
1 5 2 5.385165
2 6 3 6.708204
I would try to stick to existing numpy/scipy functions as they are extremely fast and optimized (numpy.hypot):
df['wspeed'] = np.hypot(df.latwind, df.lonwind)
Timing: against 300K rows DF:
In [47]: df = pd.concat([df] * 10**5, ignore_index=True)
In [48]: df.shape
Out[48]: (300000, 2)
In [49]: %paste
def wind_speed(u, v):
return np.sqrt(u ** 2 + v ** 2)
## -- End pasted text --
In [50]: %timeit list(map(wind_speed, df['lonwind'], df['latwind']))
1 loop, best of 3: 922 ms per loop
In [51]: %timeit np.hypot(df.latwind, df.lonwind)
100 loops, best of 3: 4.08 ms per loop
Conclusion: vectorized approach was 230 times faster
If you have to write your own one, try to use vectorized math (working with vectors / columns instead of scalars):
def wind_speed(u, v):
# using vectorized approach - column's math instead of scalar
return np.sqrt(u * u + v * v)
df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])
demo:
In [39]: df['wspeed'] = wind_speed(df['lonwind'] , df['latwind'])
In [40]: df
Out[40]:
latwind lonwind wspeed
0 4 1 4.123106
1 5 2 5.385165
2 6 3 6.708204
same vectorized approach with celsius()
function:
def celsius(T):
# using vectorized function: np.round()
return np.round(T - 273, 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With