I am doing a Kaggle tutorial for Titanic using the Datacamp platform.
I understand the use of .loc within Pandas - to select values by row using column labels...
My confusion comes from the fact that in the Datacamp tutorial, we want to locate all the "Male" inputs within the "Sex" column, and replace it with the value of 0. They use the following piece of code to do it:
titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
Can someone please explain how this works? I thought .loc took inputs of row and column, so what is the == for?
Shouldn't it be:
titanic.loc["male", "Sex"] = 0
Thanks!
Python loc() function The loc() function is label based data selecting method which means that we have to pass the name of the row or column which we want to select.
loc is used to index a pandas DataFrame or Series using labels. On the other hand, iloc can be used to retrieve records based on their positional index.
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Access group of values using labels. Single label. Note this returns the row as a Series.
Location. The main distinction between the two methods is: loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.
It set column Sex
to 1
if condition is True
only, another values are untouched:
titanic["Sex"] == "male"
Sample:
titanic = pd.DataFrame({'Sex':['male','female', 'male']})
print (titanic)
Sex
0 male
1 female
2 male
print (titanic["Sex"] == "male")
0 True
1 False
2 True
Name: Sex, dtype: bool
titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
print (titanic)
0 0
1 female
2 0
It is very similar by boolean indexing
with loc
- it select only values of column Sex
by condition:
print (titanic.loc[titanic["Sex"] == "male", "Sex"])
0 male
2 male
Name: Sex, dtype: object
But I think here better is use map
if only male
and female
values need convert to some another values:
titanic = pd.DataFrame({'Sex':['male','female', 'male']})
titanic["Sex"] = titanic["Sex"].map({'male':0, 'female':1})
print (titanic)
Sex
0 0
1 1
2 0
EDIT:
Primary loc
is used for set new value by index and columns:
titanic = pd.DataFrame({'Sex':['male','female', 'male']}, index=['a','b','c'])
print (titanic)
Sex
a male
b female
c male
titanic.loc["a", "Sex"] = 0
print (titanic)
Sex
a 0
b female
c male
titanic.loc[["a", "b"], "Sex"] = 0
print (titanic)
Sex
a 0
b 0
c male
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With