I have a DataFrame with timestamped temperature and wind speed values, and a function to convert those into a "wind chill." I'm using iterrows to run the function on each row, and hoping to get a DataFrame out with a nifty "Wind Chill" column.
However, while it seems to work as it's going through, and has actually "worked" at least once, I can't seem to replicate it consistently. I feel like it's something I'm missing about the structure of DataFrames, in general, but I'm hoping someone can help.
In [28]: bigdf.head()
Out[28]:
Day Temperature Wind Speed Year
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003
So I add a 'Wind Chill' column to bigdf and prepopulate with NaN.
In [29]: bigdf['Wind Chill'] = NaN
Then I try to iterate over the rows, to add the actual Wind Chills.
In [30]: for row_index, row in bigdf[:5].iterrows():
...: row['Wind Chill'] = windchill(row['Temperature'], row['Wind Speed'])
...: print row['Wind Chill']
...:
24.7945889994
25.1365267133
25.934114012
28.2194307516
29.5051046953
As you can say, the new values appear to be applied to the 'Wind Chill' column. Here's the windchill function, just in case that helps:
def windchill(temp, wind):
if temp>50 or wind<=3:
return temp
else:
return 35.74 + 0.6215*temp - 35.75*wind**0.16 + 0.4275*temp*wind**0.16
But, when I look at the DataFrame again, the NaN's are still there:
In [31]: bigdf.head()
Out[31]:
Day Temperature Wind Speed Year Wind Chill
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003 NaN
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003 NaN
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003 NaN
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003 NaN
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003 NaN
What's even weirder is that it has worked once or twice, and I can't tell what I did differently.
I must admit I'm not especially familiar with the inner workings of pandas, and get confused with indexing, etc., so I feel like I'm probably missing something very basic here (or doing this the hard way).
Thanks!
You can use apply to do this:
In [11]: df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
axis=1)
Out[11]:
2003-03-01 06:00:00-05:00 24.794589
2003-03-01 07:00:00-05:00 25.136527
2003-03-01 08:00:00-05:00 25.934114
2003-03-01 09:00:00-05:00 28.219431
2003-03-01 10:00:00-05:00 29.505105
In [12]: df['Wind Chill'] = df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
axis=1)
In [13]: df
Out[13]:
Day Temperature Wind Speed Year Wind Chill
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003 24.794589
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003 25.136527
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003 25.934114
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003 28.219431
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003 29.505105
.
To expand on the reason for your confusion, I think it stems from the fact that the row variables are copies rather than views of the df, so changes don't propagate:
In [21]: for _, row in df.iterrows(): row['Day'] = 2
We see that it is making the change successfully to the copy, the row variable(s):
In [22]: row
Out[22]:
Day 2.00
Temperature 35.31
Wind Speed 6.97
Year 2003.00
Name: 2003-03-01 10:00:00-05:00
Bu they don't update to the DataFrame:
In [23]: df
Out[23]:
Day Temperature Wind Speed Year
2003-03-01 06:00:00-05:00 1 30.27 5.27 2003
2003-03-01 07:00:00-05:00 1 30.21 4.83 2003
2003-03-01 08:00:00-05:00 1 31.81 6.09 2003
2003-03-01 09:00:00-05:00 1 34.04 6.61 2003
2003-03-01 10:00:00-05:00 1 35.31 6.97 2003
The following also leaves df unchanged:
In [24]: row = df.ix[0] # also a copy
In [25]: row['Day'] = 2
Whereas if we do take a view: (we'll see a change df.)
In [26]: row = df.ix[2:3] # this one's a view
In [27]: row['Day'] = 3
See Returning a view versus a copy (in the docs).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With