Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does pandas Dataframe.loc accept the [...] syntax?

Tags:

python

pandas

I have read this documentation:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html

You can use a syntax like df.loc[df['shield'] > 6, ['max_speed']].

I tried using Github and found out:

Suppose you have a pandas.core.frame.DataFrame object, i.e. a DataFrame called df.

The type of df.loc is pandas.core.indexing._LocIndexer.

Nevertheless, I could not sort out these questions:

  1. How do you make a Python function/class accepting a syntax like above?

  2. Where in the source code of pandas.core.frame.DataFrame is the property self.loc defined??

like image 334
Michael S Avatar asked Jul 29 '19 12:07

Michael S


1 Answers

  1. How you make a class accept that syntax in general is by implementing __getitem__ which is an example of operator overloading. This allows an object of that class to be indexed with []. For example:

    class get_item_example(object):
     def __getitem__(self, key):
             print(key)
    

    Try it out:

    >>> gi = get_item_example()
    >>> gi['a']
    a
    >>> gi[['a','b','c']]
    ['a', 'b', 'c']
    >>> gi['a','b','c']
    ('a', 'b', 'c')
    

    In the case of df.loc[df['shield'] > 6, ['max_speed']] what happens is that the key passed to __getitem__ is a tuple containing the pandas series returned by df['shield'] > 6 and the single item list ['max_speed'].

  2. In the pandas source, pandas.core.indexing._LocIndexer inherits an implementation of __getitem__ from pandas.core.indexing. _LocationIndexer. The implementation is here: https://github.com/pandas-dev/pandas/blob/61362be9ea4d69b33ae421f1f98b8db50be611a2/pandas/core/indexing.py#L1374

like image 172
Alex L Avatar answered Nov 01 '22 04:11

Alex L