Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why/How does Pandas use square brackets with .loc and .iloc?

So .loc and .iloc are not your typical functions. They somehow use [ and ] to surround the arguments so that it is comparable to normal array indexing. However, I have never seen this in another library (that I can think of, maybe numpy as something like this that I'm blanking on), and I have no idea how it technically works/is defined in the python code.

Are the brackets in this case just syntactic sugar for a function call? If so, how then would one make an arbitrary function use brackets instead of parenthesis? Otherwise, what is special about their use/defintion Pandas?

like image 718
Conner Phillips Avatar asked Sep 12 '17 12:09

Conner Phillips


People also ask

Why does pandas use square brackets?

The inner square brackets define a Python list with column names, whereas the outer brackets are used to select the data from a pandas DataFrame as seen in the previous example.

What is the difference between LOC [] and ILOC []?

The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.

Why we use square brackets for array?

They are used with many different things including classes, functions, loops, and conditionals. Square brackets are used to index (access) elements in arrays and also Strings. Specifically lost[i] will evaluate to the ith item in the array named lost.

What does ILOC [] do in Python?

The iloc() function in python is defined in the Pandas module that helps us to select a specific row or column from the data set. Using the iloc method in python, we can easily retrieve any particular value from a row or column by using index values.


1 Answers

Note: The first part of this answer is a direct adaptation of my answer to this other question, that was answered before this question was reopened. I expand on the "why" in the second part.

So .loc and .iloc are not your typical functions

Indeed, they are not functions at all. I'll make examples with loc, iloc is analogous (it uses different internal classes). The simplest way to check what loc actually is, is:

import pandas as pd
df = pd.DataFrame()
print(df.loc.__class__)

which prints

<class 'pandas.core.indexing._LocIndexer'>

this tells us that df.loc is an instance of a _LocIndexer class. The syntax loc[] derives from the fact that _LocIndexer defines __getitem__ and __setitem__*, which are the methods python calls whenever you use the square brackets syntax.

So yes, brackets are, technically, syntactic sugar for some function call, just not the function you thought it was (there are of course many reasons why python is designed this way, I won't go in the details here because 1) I am not sufficiently expert to provide an exhaustive answer and 2) there are a lot of better resources on the web about this topic).

*Technically, it's its base class _LocationIndexer that defines those methods, I'm simplifying a bit here


Why does Pandas use square brackets with .loc and .iloc?

I'm entering speculation area here, because I couldn't find any document explicitly talking about design choices in Pandas, however: there are at least two good reasons I see for choosing the square brackets.

The first, and most important reason is: you simply can't do with a function call everything you do with the square-bracket notation, because assigning to a function call is a syntax error in python:

# contrived example to show this can't work
a = []
def f():
  global a
  return a
f().append(1) # OK
f() = dict() # SyntaxError: cannot assign to function call

Using round brackets for a "function" call, calls the underlying __call__ method (note that any class that defines __call__ is callable, so "function" call is an incorrect term because python doesn't care whether something is a function or just behaves like one).

Using square brackets, instead, alternatively calls __getitem__ or __setitem__ depending on when the call happens (__setitem__ if it's on the left of an assignment operator, __getitem__ in any other case). There is no way to mimic this behaviour with a function call, you'd need a setter method to modify the data in the dataframe, but it still wouldn't be allowed in an assignment operation:

# imaginary method-based alternative to the square bracket notation:
my_data = df.get_loc(my_index)
df.set_loc(my_index, my_data*2)

This example brings me to the second reason: consistency. You can access elements of a DataFrame via square brackets:

something = df['a']
df['b'] = 2*something

when using loc you're still trying to refer to some items in the DataFrame, so it's more consistent to use the same syntax instead of asking the user to use some getter and setter functions (it's also, I believe, "more pythonic", but that's a fuzzy concept I'd rather stay away from).

like image 73
GPhilo Avatar answered Oct 07 '22 06:10

GPhilo