Python Pandas - difference between 'loc' and 'where'?

Question

Just curious on the behavior of 'where' and why you would use it over 'loc'.

If I create a dataframe:

df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8,9,10], 
                   'Run Distance':[234,35,77,787,243,5435,775,123,355,123],
                   'Goals':[12,23,56,7,8,0,4,2,1,34],
                   'Gender':['m','m','m','f','f','m','f','m','f','m']})

And then apply the 'where' function:

df2 = df.where(df['Goals']>10)

I get the following which filters out the results where Goals > 10, but leaves everything else as NaN:

  Gender  Goals    ID  Run Distance                                                                                                                                                  
0      m   12.0   1.0         234.0                                                                                                                                                  
1      m   23.0   2.0          35.0                                                                                                                                                  
2      m   56.0   3.0          77.0                                                                                                                                                  
3    NaN    NaN   NaN           NaN                                                                                                                                                  
4    NaN    NaN   NaN           NaN                                                                                                                                                  
5    NaN    NaN   NaN           NaN                                                                                                                                                  
6    NaN    NaN   NaN           NaN                                                                                                                                                  
7    NaN    NaN   NaN           NaN                                                                                                                                                  
8    NaN    NaN   NaN           NaN                                                                                                                                                  
9      m   34.0  10.0         123.0

If however I use the 'loc' function:

df2 = df.loc[df['Goals']>10]

It returns the dataframe subsetted without the NaN values:

  Gender  Goals  ID  Run Distance                                                                                                                                                    
0      m     12   1           234                                                                                                                                                    
1      m     23   2            35                                                                                                                                                    
2      m     56   3            77                                                                                                                                                    
9      m     34  10           123

So essentially I am curious why you would use 'where' over 'loc/iloc' and why it returns NaN values?

Josh Friedlander · Accepted Answer

Think of loc as a filter - give me only the parts of the df that conform to a condition.

where originally comes from numpy. It runs over an array and checks if each element fits a condition. So it gives you back the entire array, with a result or NaN. A nice feature of where is that you can also get back something different, e.g. df2 = df.where(df['Goals']>10, other='0'), to replace values that don't meet the condition with 0.

ID  Run Distance Goals Gender
0   1   234      12     m
1   2   35       23     m
2   3   77       56     m
3   0   0        0      0
4   0   0        0      0
5   0   0        0      0
6   0   0        0      0
7   0   0        0      0
8   0   0        0      0
9   10  123      34     m

Also, while where is only for conditional filtering, loc is the standard way of selecting in Pandas, along with iloc. loc uses row and column names, while iloc uses their index number. So with loc you could choose to return, say, df.loc[0:1, ['Gender', 'Goals']]:

    Gender  Goals
0   m   12
1   m   23

jezrael · Answer

If check docs DataFrame.where it replace rows by condition - default by NAN, but is possible specify value:

df2 = df.where(df['Goals']>10)
print (df2)
     ID  Run Distance  Goals Gender
0   1.0         234.0   12.0      m
1   2.0          35.0   23.0      m
2   3.0          77.0   56.0      m
3   NaN           NaN    NaN    NaN
4   NaN           NaN    NaN    NaN
5   NaN           NaN    NaN    NaN
6   NaN           NaN    NaN    NaN
7   NaN           NaN    NaN    NaN
8   NaN           NaN    NaN    NaN
9  10.0         123.0   34.0      m

df2 = df.where(df['Goals']>10, 100)
print (df2)
    ID  Run Distance  Goals Gender
0    1           234     12      m
1    2            35     23      m
2    3            77     56      m
3  100           100    100    100
4  100           100    100    100
5  100           100    100    100
6  100           100    100    100
7  100           100    100    100
8  100           100    100    100
9   10           123     34      m

Another syntax is called boolean indexing and is for filter rows - remove rows matched condition.

df2 = df.loc[df['Goals']>10]
#alternative
df2 = df[df['Goals']>10]

print (df2)
   ID  Run Distance  Goals Gender
0   1           234     12      m
1   2            35     23      m
2   3            77     56      m
9  10           123     34      m

If use loc is possible also filter by rows by condition and columns by name(s):

s = df.loc[df['Goals']>10, 'ID']
print (s)
0     1
1     2
2     3
9    10
Name: ID, dtype: int64

df2 = df.loc[df['Goals']>10, ['ID','Gender']]
print (df2)
   ID Gender
0   1      m
1   2      m
2   3      m
9  10      m

Python Pandas - difference between 'loc' and 'where'?

Tags:

python

pandas

ScoutEU

2 Answers

Josh Friedlander

jezrael

Recent Activity

Donate For Us

Python Pandas - difference between 'loc' and 'where'?

Tags:

python

pandas

ScoutEU

2 Answers

Josh Friedlander

jezrael

Related questions

Recent Activity

Donate For Us