I find myself often having to check whether a column or row exists in a dataframe before trying to reference it. For example I end up adding a lot of code like:
if 'mycol' in df.columns and 'myindex' in df.index: x = df.loc[myindex, mycol] else: x = mydefault
Is there any way to do this more nicely? For example on an arbitrary object I can do x = getattr(anobject, 'id', default)
- is there anything similar to this in pandas? Really any way to achieve what I'm doing more gracefully?
The loc property is used to access a group of rows and columns by label(s) or a boolean array. . loc[] is primarily label based, but may also be used with a boolean array.
The main difference between pandas loc[] vs iloc[] is loc gets DataFrame rows & columns by labels/names and iloc[] gets by integer Index/position. For loc[], if the label is not present it gives a key error. For iloc[], if the position is not present it gives an index error.
The key concepts that are connected to the SettingWithCopyWarning are views and copies. Some operations in pandas (and numpy as well) will return views of the original data, while other copies.
There is a method for Series
:
So you could do:
df.mycol.get(myIndex, NaN)
Example:
In [117]: df = pd.DataFrame({'mycol':arange(5), 'dummy':arange(5)}) df Out[117]: dummy mycol 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 [5 rows x 2 columns] In [118]: print(df.mycol.get(2, NaN)) print(df.mycol.get(5, NaN)) 2 nan
Python has this mentality to ask for forgiveness instead of permission. You'll find a lot of posts on this matter, such as this one.
In Python catching exceptions is relatively inexpensive, so you're encouraged to use it. This is called the EAFP approach.
For example:
try: x = df.loc['myindex', 'mycol'] except KeyError: x = mydefault
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With