I have a pandas dataframe where I need to do some simple calculations on particular data points. I was having a problem where the result was producing a NaN result.
In this simple version of what I was doing, the first attempt works fine, but the second produces a NaN
import pandas as pd
import numpy as np
df_data = {'Location' : ['Denver', 'Boulder', 'San Diego', 'Reno', 'Portland',
'Eugene', 'San Francisco'], 'State' : ['co', 'co', 'ca', 'nv',
'or', 'or', 'ca'], 'Rando_num': [18.134, 5, 34, 11, 72, 42, 9],
'Other_num': [11, 26, 55, 134, 88, 4, 22]}
df = pd.DataFrame(data = df_data)
df['Sum'] = np.nan
print(df.loc[df['Location'] == 'Denver', 'Rando_num'])
print(df.loc[df['Location'] == 'Denver', 'Other_num'])
#This works
df.loc[df['Location'] == 'Denver', 'Sum'] = (
df.loc[df['Location'] == 'Denver', 'Rando_num'] +
df.loc[df['Location'] == 'Denver', 'Other_num'])
print(df)
#This don't
df.loc[df['Location'] == 'Boulder', 'Sum'] = (
df.loc[df['Location'] == 'Denver', 'Rando_num'] +
df.loc[df['Location'] == 'Reno', 'Rando_num'])
print(df)
Using df.loc to find the specific data points works fine where location is Denver but not when it is two different locations. I don't get why that is. If I add .values it fixes the problem:
df.loc[df['Location'] == 'Boulder', 'Sum'] = (
df.loc[df['Location'] == 'Denver', 'Rando_num'].values +
df.loc[df['Location'] == 'Reno', 'Rando_num'].values)
Does the community know of cases where a function like this would need the .values element to work? Or put another way, what is fundamentally different once the .values is added?
If it helps, all elements are floats and the df.loc is always a single value.
1st case
df.loc[df['Location'] == 'Denver', 'Sum'] = (
df.loc[df['Location'] == 'Denver', 'Rando_num'] +
df.loc[df['Location'] == 'Denver', 'Other_num'])
Notice that the selection is same across and the indices remain the same. When you add values with same range of indices or size it works.
2nd Case
df.loc[df['Location'] == 'Boulder', 'Sum'] = (
df.loc[df['Location'] == 'Denver', 'Rando_num'] +
df.loc[df['Location'] == 'Reno', 'Rando_num'])
Here, the selections are different as given below and when you add NaN to a number, NaN is the result. Addition works at same index.
>>> df.loc[df['Location'] == 'Denver', 'Rando_num']
0 18.134
Name: Rando_num, dtype: float64
>>> df.loc[df['Location'] == 'Reno', 'Rando_num']
3 11.0
Name: Rando_num, dtype: float64
Additionally, to understand better
Left Index Right Index Sum
0->18.134 0->NaN NaN
1->NaN 1->NaN NaN
2->NaN 2->NaN NaN
3->NaN 3->11.0 NaN
4->NaN 4->NaN NaN
5->NaN 5->NaN NaN
3rd Case
With .values
>>> a = df.loc[df['Location'] == 'Denver', 'Rando_num'].values
array([18.134])
>>> b = df.loc[df['Location'] == 'Reno', 'Rando_num'].values
array([11.])
>>> a + b
array([29.134])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With