Can anybody explain why is loc used in python pandas with examples like shown below?
for i in range(0, 2):
for j in range(0, 3):
df.loc[(df.Age.isnull()) & (df.Gender == i) & (df.Pclass == j+1),
'AgeFill'] = median_ages[i,j]
The loc property is used to access a group of rows and columns by label(s) or a boolean array.
loc is primarily used for label indexing and . iloc function is mainly applied for integer indexing.
loc. Access a group of rows and columns by label(s) or a boolean array. .loc[] is primarily label based, but may also be used with a boolean array.
The use of .loc
is recommended here because the methods df.Age.isnull()
, df.Gender == i
and df.Pclass == j+1
may return a view of slices of the data frame or may return a copy. This can confuse pandas.
If you don't use .loc
you end up calling all 3 conditions in series which leads you to a problem called chained indexing. When you use .loc
however you access all your conditions in one step and pandas is no longer confused.
You can read more about this along with some examples of when not using .loc
will cause the operation to fail in the pandas documentation.
The simple answer is that while you can often get away with not using .loc
and simply typing (for example)
df['Age_fill'][(df.Age.isnull()) & (df.Gender == i) & (df.Pclass == j+1)] \
= median_ages[i,j]
you'll always get the SettingWithCopy
warning and your code will be a little messier for it.
In my experience .loc
has taken me a while to get my head around and it's been a bit annoying updating my code. But it's really super simple and very intuitive: df.loc[row_index,col_indexer]
.
For more information see the pandas documentation on Indexing and Selecting Data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With