When does pandas need to use .values to do manipulations?

Question

I have a pandas dataframe where I need to do some simple calculations on particular data points. I was having a problem where the result was producing a NaN result.

In this simple version of what I was doing, the first attempt works fine, but the second produces a NaN

import pandas as pd
import numpy as np

df_data = {'Location' : ['Denver', 'Boulder', 'San Diego', 'Reno', 'Portland',
    'Eugene', 'San Francisco'], 'State' : ['co', 'co', 'ca', 'nv',
    'or', 'or', 'ca'], 'Rando_num': [18.134, 5, 34, 11, 72, 42, 9],
    'Other_num': [11, 26, 55, 134, 88, 4, 22]}
df = pd.DataFrame(data = df_data)
df['Sum'] = np.nan

print(df.loc[df['Location'] == 'Denver', 'Rando_num'])
print(df.loc[df['Location'] == 'Denver', 'Other_num'])

#This works
df.loc[df['Location'] == 'Denver', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Denver', 'Other_num'])

print(df)

#This don't
df.loc[df['Location'] == 'Boulder', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Reno', 'Rando_num'])

print(df)

Using df.loc to find the specific data points works fine where location is Denver but not when it is two different locations. I don't get why that is. If I add .values it fixes the problem:

df.loc[df['Location'] == 'Boulder', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'].values +
        df.loc[df['Location'] == 'Reno', 'Rando_num'].values)

Does the community know of cases where a function like this would need the .values element to work? Or put another way, what is fundamentally different once the .values is added?

If it helps, all elements are floats and the df.loc is always a single value.

Vishnudev · Accepted Answer

1st case

df.loc[df['Location'] == 'Denver', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Denver', 'Other_num'])

Notice that the selection is same across and the indices remain the same. When you add values with same range of indices or size it works.

2nd Case

df.loc[df['Location'] == 'Boulder', 'Sum'] = (
        df.loc[df['Location'] == 'Denver', 'Rando_num'] +
        df.loc[df['Location'] == 'Reno', 'Rando_num'])

Here, the selections are different as given below and when you add NaN to a number, NaN is the result. Addition works at same index.

>>> df.loc[df['Location'] == 'Denver', 'Rando_num']
0    18.134
Name: Rando_num, dtype: float64

>>> df.loc[df['Location'] == 'Reno', 'Rando_num']
3    11.0
Name: Rando_num, dtype: float64

Additionally, to understand better

Left Index    Right Index    Sum
0->18.134     0->NaN         NaN
1->NaN        1->NaN         NaN
2->NaN        2->NaN         NaN
3->NaN        3->11.0        NaN
4->NaN        4->NaN         NaN
5->NaN        5->NaN         NaN

3rd Case

With .values

>>> a = df.loc[df['Location'] == 'Denver', 'Rando_num'].values
array([18.134])
>>> b = df.loc[df['Location'] == 'Reno', 'Rando_num'].values
array([11.])
>>> a + b
array([29.134])

When does pandas need to use .values to do manipulations?

Tags:

python

pandas

Tom

1 Answers

Vishnudev

Recent Activity

Donate For Us

When does pandas need to use .values to do manipulations?

Tags:

python

pandas

Tom

1 Answers

Vishnudev

Related questions

Recent Activity

Donate For Us