Why doesn't this function "take" after I iterrows over a pandas DataFrame?

Question

I have a DataFrame with timestamped temperature and wind speed values, and a function to convert those into a "wind chill." I'm using iterrows to run the function on each row, and hoping to get a DataFrame out with a nifty "Wind Chill" column.

However, while it seems to work as it's going through, and has actually "worked" at least once, I can't seem to replicate it consistently. I feel like it's something I'm missing about the structure of DataFrames, in general, but I'm hoping someone can help.

In [28]: bigdf.head()
Out[28]: 


                           Day  Temperature  Wind Speed  Year
2003-03-01 06:00:00-05:00  1    30.27        5.27        2003
2003-03-01 07:00:00-05:00  1    30.21        4.83        2003
2003-03-01 08:00:00-05:00  1    31.81        6.09        2003
2003-03-01 09:00:00-05:00  1    34.04        6.61        2003
2003-03-01 10:00:00-05:00  1    35.31        6.97        2003

So I add a 'Wind Chill' column to bigdf and prepopulate with NaN.

In [29]: bigdf['Wind Chill'] = NaN

Then I try to iterate over the rows, to add the actual Wind Chills.

In [30]: for row_index, row in bigdf[:5].iterrows():
    ...:     row['Wind Chill'] = windchill(row['Temperature'], row['Wind Speed'])
    ...:     print row['Wind Chill']
    ...:
24.7945889994
25.1365267133
25.934114012
28.2194307516
29.5051046953

As you can say, the new values appear to be applied to the 'Wind Chill' column. Here's the windchill function, just in case that helps:

def windchill(temp, wind):
    if temp>50 or wind<=3:
        return temp
    else:
        return 35.74 + 0.6215*temp - 35.75*wind**0.16 + 0.4275*temp*wind**0.16

But, when I look at the DataFrame again, the NaN's are still there:

In [31]: bigdf.head()
Out[31]: 

                           Day  Temperature  Wind Speed  Year  Wind Chill
2003-03-01 06:00:00-05:00  1    30.27        5.27        2003  NaN
2003-03-01 07:00:00-05:00  1    30.21        4.83        2003  NaN
2003-03-01 08:00:00-05:00  1    31.81        6.09        2003  NaN
2003-03-01 09:00:00-05:00  1    34.04        6.61        2003  NaN
2003-03-01 10:00:00-05:00  1    35.31        6.97        2003  NaN

What's even weirder is that it has worked once or twice, and I can't tell what I did differently.

I must admit I'm not especially familiar with the inner workings of pandas, and get confused with indexing, etc., so I feel like I'm probably missing something very basic here (or doing this the hard way).

Thanks!

Andy Hayden · Accepted Answer

You can use apply to do this:

In [11]: df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
                 axis=1)
Out[11]:
2003-03-01 06:00:00-05:00    24.794589
2003-03-01 07:00:00-05:00    25.136527
2003-03-01 08:00:00-05:00    25.934114
2003-03-01 09:00:00-05:00    28.219431
2003-03-01 10:00:00-05:00    29.505105

In [12]: df['Wind Chill'] = df.apply(lambda row: windchill(row['Temperature'], row['Wind Speed']),
                                    axis=1)

In [13]: df
Out[13]:
                           Day  Temperature  Wind Speed  Year  Wind Chill
2003-03-01 06:00:00-05:00    1        30.27        5.27  2003   24.794589
2003-03-01 07:00:00-05:00    1        30.21        4.83  2003   25.136527
2003-03-01 08:00:00-05:00    1        31.81        6.09  2003   25.934114
2003-03-01 09:00:00-05:00    1        34.04        6.61  2003   28.219431
2003-03-01 10:00:00-05:00    1        35.31        6.97  2003   29.505105

.

To expand on the reason for your confusion, I think it stems from the fact that the row variables are copies rather than views of the df, so changes don't propagate:

In [21]: for _, row in df.iterrows(): row['Day'] = 2

We see that it is making the change successfully to the copy, the row variable(s):

In [22]: row
Out[22]:
Day               2.00
Temperature      35.31
Wind Speed        6.97
Year           2003.00
Name: 2003-03-01 10:00:00-05:00

Bu they don't update to the DataFrame:

In [23]: df
Out[23]:
                           Day  Temperature  Wind Speed  Year
2003-03-01 06:00:00-05:00    1        30.27        5.27  2003
2003-03-01 07:00:00-05:00    1        30.21        4.83  2003
2003-03-01 08:00:00-05:00    1        31.81        6.09  2003
2003-03-01 09:00:00-05:00    1        34.04        6.61  2003
2003-03-01 10:00:00-05:00    1        35.31        6.97  2003

The following also leaves df unchanged:

In [24]: row = df.ix[0]  # also a copy

In [25]: row['Day'] = 2

Whereas if we do take a view: (we'll see a change df.)

In [26]: row = df.ix[2:3]  # this one's a view

In [27]: row['Day'] = 3

See Returning a view versus a copy (in the docs).

Why doesn't this function "take" after I iterrows over a pandas DataFrame?

Tags:

python

pandas

wimsy

1 Answers

Andy Hayden

Recent Activity

Donate For Us

Why doesn't this function "take" after I iterrows over a pandas DataFrame?

Tags:

python

pandas

wimsy

1 Answers

Andy Hayden

Related questions

Recent Activity

Donate For Us