This has been discussed before, but with conflicting answers:
What I'm wondering is:
inplace = False
the default behavior?inplace = True
?inplace = True
operation will "really" be carried out in-place?inplace
parameter, always defaulting to False
, meaning the original DataFrame is untouched, and the operation returns a new DF.inplace = True
, the operation might work on the original DF, but it might still work on a copy behind the scenes, and just reassign the reference when done.inplace = True
:reset_index()
runs twice as fast and uses half the peak memory!).inplace = False
:df.dropna().rename().sum()...
which is nice, and offers a chance for lazy evaluation or a more efficient re-ordering (though I don't think Pandas is doing this).inplace = True
on an object which is potentially a slice/view of an underlying DF, Pandas has to do a SettingWithCopy
check, which is expensive. inplace = False
avoids this.So, putting the copy-vs-view issue aside, it seems more performant to always use inplace = True
, unless specifically writing a chained statement. But that's not the default Pandas opt for, so what am I missing?
Using the inplace=True keyword in a pandas method changes the default behaviour such that the operation on the dataframe doesn't return anything, it instead 'modifies the underlying data' (more on that later). It mutates the actual object which you apply it to.
When inplace = True is used, it performs operation on data and nothing is returned. When inplace=False is used, it performs operation on data and returns a new copy of data.
At its core, the inplace parameter helps you decide how you want to affect the underlying data of the Pandas object. Do you want to make a change to the dataframe object you are working on and overwrite what was there before?
Does the pandas apply() method have an inplace parameter? No, the apply() method doesn't contain an inplace parameter, unlike these pandas methods which have an inplace parameter: df.
In pandas, is inplace = True considered harmful, or not? TLDR; Yes, yes it is. inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
When using inplace = True on an object which is potentially a slice/view of an underlying DF, Pandas has to do a SettingWithCopy check, which is expensive. inplace = False avoids this. Consistent & predictable behavior behind the scenes.
This won’t be news to you if you’ve got experience using the inplace keyword, but just a quick recap of how it works. Inplace is a parameter accepted by a number of pandas methods which affects the behaviour of how the method runs.
When inplace=True is passed, the data is renamed in place (it returns nothing), so you'd use: When inplace=False is passed (this is the default value, so isn't necessary), performs the operation and returns a copy of the object, so you'd use: In pandas, is inplace = True considered harmful, or not?
In pandas, is inplace = True considered harmful, or not?
Yes, it is. Not just harmful. Quite harmful. This GitHub issue is proposing the inplace
argument be deprecated api-wide sometime in the near future. In a nutshell, here's everything wrong with the inplace
argument:
inplace
, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefitsinplace
does not work with method chaininginplace
can lead to the dreaded SettingWithCopyWarning
when called on a DataFrame column, and may sometimes fail to update the column in-placeThe pain points above are all common pitfall for beginners, so removing this option will simplify the API greatly.
We take a look at the points above in more depth.
Performance
It is a common misconception that using inplace=True
will lead to more efficient or optimized code. In general, there are no performance benefits to using inplace=True
(but there are rare exceptions which are mostly a result of implementation detail in the library and should not be used as a crutch to advocate for this argument's usage). Most in-place and out-of-place versions of a method create a copy of the data anyway, with the in-place version automatically assigning the copy back. The copy cannot be avoided.
Method Chaininginplace=True
also hinders method chaining. Contrast the working of
result = df.some_function1().reset_index().some_function2()
As opposed to
temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()
Unintended Pitfalls
One final caveat to keep in mind is that calling inplace=True
can trigger the SettingWithCopyWarning
:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
Which can cause unexpected behavior.
If inplace
was the default then the DataFrame would be mutated for all names that currently reference it.
A simple example, say I have a df
:
df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})
Now it's very important that DataFrame retains that row order - let's say it's from a data source where insertion order is key for instance.
However, I now need to do some operations which require a different sort order:
def f(frame):
df = frame.sort_values('a')
# if we did frame.sort_values('a', inplace=True) here without
# making it explicit - our caller is going to wonder what happened
# do something
return df
That's fine - my original df
remains the same. However, if inplace=True
were the default then my original df
will now be sorted as a side-effect of f()
in which I'd have to trust the caller to remember to not do something in place I'm not expecting instead of deliberately doing something in place... So it's better that anything that can mutate an object in place does so explicitly to at least make it more obvious what's happened and why.
Even with basic Python builtin mutables, you can observe this:
data = [3, 2, 1]
def f(lst):
lst.sort()
# I meant lst = sorted(lst)
for item in lst:
print(item)
f(data)
for item in data:
print(item)
# huh!? What happened to my data - why's it not 3, 2, 1?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With