For pandas Dataframe.__getitem__()
, what are the allowed inputs (input types really), and what results does the function produce as a result?
I would like to write code that makes full use of DataFrame[], essentially Dataframe.__getitem__()
. To that end, I would like information on inputs/return results, at the level of detail found on the API page, though not available there for this method.
I looked for a complete spec for that function at the Pandas API page. Though many other methods are documented, Dataframe.__getitem__()
is not.
I also looked at the tutorial, but I don't believe that's attempting to be exhaustive.
I did look at the source code for Dataframe.__getitem__()
(first pass at this described in my own answer below). It's evident that a variety of quite different types can be accepted as input, but reverse engineering the code to determine what happens when each of those types is passed seems like it can't be the intended way to master this method.
Pandas is one of the most important libraries in Python's role in science and statistics, DataFrame is arguably the most central object in Pandas, and the []
operator is arguably the most central method on DataFrame. Hence, actually answering the question I have posted here has a very high pedagogical value, not just some utility for me.
One approach that can be used to suppress SettingWithCopyWarning is to perform the chained operations into just a single loc operation. This will ensure that the assignment happens on the original DataFrame instead of a copy. Therefore, if we attempt doing so the warning should no longer be raised.
loc" performance is better and gradually it is become slower by increasing the number of records in loop.
I'm suspecting part of the lack of doc for this function is due to lack of doc comments in the source, now that I look at it. In case nobody comes up with anything more user-friendly, here's the actual DataFrame.__getitem__()
method:
def __getitem__(self, key):
# shortcut if we are an actual column
is_mi_columns = isinstance(self.columns, MultiIndex)
try:
if key in self.columns and not is_mi_columns:
return self._getitem_column(key)
except:
pass
# see if we can slice the rows
indexer = _convert_to_index_sliceable(self, key)
if indexer is not None:
return self._getitem_slice(indexer)
if isinstance(key, (Series, np.ndarray, list)):
# either boolean or fancy integer index
return self._getitem_array(key)
elif isinstance(key, DataFrame):
return self._getitem_frame(key)
elif is_mi_columns:
return self._getitem_multilevel(key)
else:
return self._getitem_column(key)
... which at least gives a top-level breakdown of the kinds of key (index) that DataFrame[] accepts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With