Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas best way to subset a dataframe inplace, using a mask

I have a pandas dataset that I want to downsize (remove all values under x).

The mask is df[my_column] > 50

I would typically just use df = df[mask], but want to avoid making a copy every time, particularly because it gets error prone when used in functions (as it only gets altered in the function scope).

What is the best way to subset a dataset inplace?

I was thinking of something along the lines of
df.drop(df.loc[mask].index, inplace = True)

Is there a better way to do this, or any situation where this won't work at all?

like image 271
sapo_cosmico Avatar asked Oct 13 '15 13:10

sapo_cosmico


People also ask

How do you mask a DataFrame?

Pandas DataFrame mask() MethodThe mask() method replaces the values of the rows where the condition evaluates to True. The mask() method is the opposite of the The where() method.

Why you should probably never use Pandas inplace true?

If you use chaining (which gives you major pandas style points), then you won't have to! inplace=True prevents the use of chaining because nothing is returned from the methods. That's a big stylistic blow because chaining is where pandas really comes to life.

What does DF mask do?

mask() function return an object of same shape as self and whose corresponding entries are from self where cond is False and otherwise are from other object. The other object could be a scalar, series, dataframe or could be a callable. The mask method is an application of the if-then idiom.


2 Answers

You are missing the inplace parameter :

df.drop(df[df.my_column < 50].index, inplace = True)

like image 101
Arcyno Avatar answered Oct 15 '22 03:10

Arcyno


you can use df.query()

like:

bool_series = df[my_column] > 50
df.query("@bool_series",inplace=True)
like image 31
Mkelar Avatar answered Oct 15 '22 04:10

Mkelar