Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python .loc confusion

I am doing a Kaggle tutorial for Titanic using the Datacamp platform.

I understand the use of .loc within Pandas - to select values by row using column labels...

My confusion comes from the fact that in the Datacamp tutorial, we want to locate all the "Male" inputs within the "Sex" column, and replace it with the value of 0. They use the following piece of code to do it:

titanic.loc[titanic["Sex"] == "male", "Sex"] = 0

Can someone please explain how this works? I thought .loc took inputs of row and column, so what is the == for?

Shouldn't it be:

titanic.loc["male", "Sex"] = 0

Thanks!

like image 745
fashioncoder Avatar asked Jul 11 '17 13:07

fashioncoder


People also ask

What does .loc in python means?

Python loc() function The loc() function is label based data selecting method which means that we have to pass the name of the row or column which we want to select.

Should I use loc or ILOC?

loc is used to index a pandas DataFrame or Series using labels. On the other hand, iloc can be used to retrieve records based on their positional index.

What does .loc return in Pandas?

Returns a cross-section (row(s) or column(s)) from the Series/DataFrame. Access group of values using labels. Single label. Note this returns the row as a Series.

What is the difference between DF loc and DF ILOC?

Location. The main distinction between the two methods is: loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.


1 Answers

It set column Sex to 1 if condition is True only, another values are untouched:

titanic["Sex"] == "male"

Sample:

titanic = pd.DataFrame({'Sex':['male','female', 'male']})
print (titanic)
      Sex
0    male
1  female
2    male

print (titanic["Sex"] == "male")
0     True
1    False
2     True
Name: Sex, dtype: bool

titanic.loc[titanic["Sex"] == "male", "Sex"] = 0
print (titanic)

0       0
1  female
2       0

It is very similar by boolean indexing with loc - it select only values of column Sex by condition:

print (titanic.loc[titanic["Sex"] == "male", "Sex"])
0    male
2    male
Name: Sex, dtype: object

But I think here better is use map if only male and female values need convert to some another values:

titanic = pd.DataFrame({'Sex':['male','female', 'male']})
titanic["Sex"] = titanic["Sex"].map({'male':0, 'female':1})
print (titanic)
   Sex
0    0
1    1
2    0

EDIT:

Primary loc is used for set new value by index and columns:

titanic = pd.DataFrame({'Sex':['male','female', 'male']}, index=['a','b','c'])
print (titanic)
      Sex
a    male
b  female
c    male

titanic.loc["a", "Sex"] = 0
print (titanic)
      Sex
a       0
b  female
c    male

titanic.loc[["a", "b"], "Sex"] = 0
print (titanic)
    Sex
a     0
b     0
c  male
like image 142
jezrael Avatar answered Oct 01 '22 13:10

jezrael