I have a pandas dataset that I want to downsize (remove all values under x).
The mask is df[my_column] > 50
I would typically just use df = df[mask]
, but want to avoid making a copy every time, particularly because it gets error prone when used in functions (as it only gets altered in the function scope).
What is the best way to subset a dataset inplace?
I was thinking of something along the lines ofdf.drop(df.loc[mask].index, inplace = True)
Is there a better way to do this, or any situation where this won't work at all?
Pandas DataFrame mask() MethodThe mask() method replaces the values of the rows where the condition evaluates to True. The mask() method is the opposite of the The where() method.
If you use chaining (which gives you major pandas style points), then you won't have to! inplace=True prevents the use of chaining because nothing is returned from the methods. That's a big stylistic blow because chaining is where pandas really comes to life.
mask() function return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other object. The other object could be a scalar, series, dataframe or could be a callable. The mask method is an application of the if-then idiom.
You are missing the inplace parameter :
df.drop(df[df.my_column < 50].index, inplace = True)
you can use df.query()
like:
bool_series = df[my_column] > 50
df.query("@bool_series",inplace=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With