Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why can you do df.loc(False)['value'] in pandas?

I do not see any documentation on pandas explaining the parameter False passed into loc. Can anyone explain how () and [] differ in this case?

like image 981
user1559897 Avatar asked Aug 17 '17 13:08

user1559897


People also ask

What does DF loc do in Pandas?

DataFrame. loc[] is a property that is used to access a group of rows and columns by label(s) or a boolean array. Pandas DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns and rows.

What does the loc method allow you to do Python?

The loc() function helps us to retrieve data values from a dataset at an ease. Using the loc() function, we can access the data values fitted in the particular row or column based on the index value passed to the function.

What is the difference between loc [] and ILOC []?

The main distinction between loc and iloc is: loc is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc is integer position-based, so you have to specify rows and columns by their integer position values (0-based integer position).


2 Answers

df.loc is an instance of the _LocIndexer class, which happens to be a subclass of the _NDFrameIndexer class.

When you do df.loc(...), it would seem the __call__ method is invoked which harmlessly returns another instance of itself. For example:

In [641]: df.loc
Out[641]: <pandas.core.indexing._LocIndexer at 0x10eb5f240>

In [642]: df.loc()()()()()()
Out[642]: <pandas.core.indexing._LocIndexer at 0x10eb5fe10>

...

And so on. The value passed in (...) is not used by the instance in any way.

On the other hand, the attributes passed to [...] are sent to __getitem__/__setitem__ which does the retrieval/setting.

like image 55
cs95 Avatar answered Oct 21 '22 07:10

cs95


As the other answers already explain, the () braces invokes the __call__ method, which is defined as:

def __call__(self, axis=None):
    # we need to return a copy of ourselves
    new_self = self.__class__(self.obj, self.name)

    new_self.axis = axis
    return new_self

It returns a copy of itself. Now, what the argument passed in between the () does, is to instantiate the axis member of your new copy. So, this might raise the question as to why it does not matter what value you pass as argument, the resulting indexer is exactly the same. The answer to this question lies in the fact that the superclass _NDFrameIndexer is used for multiple child classes.

For the .loc method, which calls upon the _LocIndexer class, this member does not matter. The LocIndexer class is itself a subclass of _LocationIndexer, which is a subclass of _NDFrameIndexer.

Every time the axis is called on by the _LocationIndexer, it is defaulted to zero, with no possibility of specifying it yourself. For example I'll refer to one of the functions within the class, with others following suit:

def __getitem__(self, key):
    if type(key) is tuple:
        key = tuple(com._apply_if_callable(x, self.obj) for x in key)
        try:
            if self._is_scalar_access(key):
                return self._getitem_scalar(key)
        except (KeyError, IndexError):
            pass
        return self._getitem_tuple(key)
    else:
        key = com._apply_if_callable(key, self.obj)
        return self._getitem_axis(key, axis=0)

So, no matter what argument you pass in .loc(whatever), it will be overridden with the default value. Similar behaviour you will see when calling .iloc, which calls _iLocIndexer(_LocationIndexer) and thus also overrides this axis by default.

Where DOES this axis come into play then? The answer is: in the deprecated .ix method. I have a dataframe of shape (2187, 5), and now define:

a = df.ix(0)
b= df.ix(1)
c = df.ix(2)
a[0] == b[0] #True
b[0] == c[0] #True
a[0,1] == b[0,1] #False

If you use simple scalar indexing, axis is still ignored in this 2-D example, as the get method falls back to simple integer-based scalar indexing. However, a[0,1] has shape (2,5) <- it takes the first two entries along axis=0; b[0,1] has shape (2187, 2) <- it takes the first two entries along axis=1; c[0,1] returns ValueError: No axis named 2 for object type <class 'pandas.core.frame.DataFrame'>.

In other words:

You can still invoke the call method of the _NDFrameIndexer class, as it is used in the _IXIndexer subclass. However: Starting in 0.20.0, the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers. The argument passed to call for .iloc and .loc is ignored.

like image 34
Uvar Avatar answered Oct 21 '22 06:10

Uvar