Understanding inplace=True

In pandas, is inplace = True considered harmful, or not?

TLDR; Yes, yes it is.

inplace, contrary to what the name implies, often does not prevent copies from being created, and (almost) never offers any performance benefits
inplace does not work with method chaining
inplace can lead to SettingWithCopyWarning if used on a DataFrame column, and may prevent the operation from going though, leading to hard-to-debug errors in code

The pain points above are common pitfalls for beginners, so removing this option will simplify the API.

I don't advise setting this parameter as it serves little purpose. See this GitHub issue which proposes the inplace argument be deprecated api-wide.

It is a common misconception that using inplace=True will lead to more efficient or optimized code. In reality, there are absolutely no performance benefits to using inplace=True. Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back.

inplace=True is a common pitfall for beginners. For example, it can trigger the SettingWithCopyWarning:

df = pd.DataFrame({'a': [3, 2, 1], 'b': ['x', 'y', 'z']})

df2 = df[df['a'] > 1]
df2['b'].replace({'x': 'abc'}, inplace=True)
# SettingWithCopyWarning: 
# A value is trying to be set on a copy of a slice from a DataFrame

Calling a function on a DataFrame column with inplace=True may or may not work. This is especially true when chained indexing is involved.

As if the problems described above aren't enough, inplace=True also hinders method chaining. Contrast the working of

result = df.some_function1().reset_index().some_function2()

As opposed to

temp = df.some_function1()
temp.reset_index(inplace=True)
result = temp.some_function2()

The former lends itself to better code organization and readability.

Another supporting claim is that the API for set_axis was recently changed such that inplace default value was switched from True to False. See GH27600. Great job devs!

The way I use it is

# Have to assign back to dataframe (because it is a new copy)
df = df.some_operation(inplace=False)

# No need to assign back to dataframe (because it is on the same copy)
df.some_operation(inplace=True)

CONCLUSION:

 if inplace is False
      Assign to a new variable;
 else
      No need to assign

The inplace parameter:

df.dropna(axis='index', how='all', inplace=True)

in Pandas and in general means:

1. Pandas creates a copy of the original data

2. ... does some computation on it

3. ... assigns the results to the original data.

4. ... deletes the copy.

As you can read in the rest of my answer's further below, we still can have good reason to use this parameter i.e. the inplace operations, but we should avoid it if we can, as it generate more issues, as:

1. Your code will be harder to debug (Actually SettingwithCopyWarning stands for warning you to this possible problem)

2. Conflict with method chaining

So there is even case when we should use it yet?

Definitely yes. If we use pandas or any tool for handeling huge dataset, we can easily face the situation, where some big data can consume our entire memory. To avoid this unwanted effect we can use some technics like method chaining:

(
    wine.rename(columns={"color_intensity": "ci"})
    .assign(color_filter=lambda x: np.where((x.hue > 1) & (x.ci > 7), 1, 0))
    .query("alcohol > 14 and color_filter == 1")
    .sort_values("alcohol", ascending=False)
    .reset_index(drop=True)
    .loc[:, ["alcohol", "ci", "hue"]]
)

which make our code more compact (though harder to interpret and debug too) and consumes less memory as the chained methods works with the other method's returned values, thus resulting in only one copy of the input data. We can see clearly, that we will have 2 x original data memory consumption after this operations.

Or we can use inplace parameter (though harder to interpret and debug too) our memory consumption will be 2 x original data, but our memory consumption after this operation remains 1 x original data, which if somebody whenever worked with huge datasets exactly knows can be a big benefit.

Final conclusion:

Avoid using inplace parameter unless you don't work with huge data and be aware of its possible issues in case of still using of it.

Save it to the same variable

data["column01"].where(data["column01"]< 5, inplace=True)

Save it to a separate variable

data["column02"] = data["column01"].where(data["column1"]< 5)

But, you can always overwrite the variable

data["column01"] = data["column01"].where(data["column1"]< 5)

FYI: In default inplace = False

df.rename(columns={0: 'Grades'}, inplace=True)
df

We use 'inplace=False' (this is also the default value) when we don't want to commit the changes but just print the resulting database. So, in effect a copy of the original database with the committed changes is printed without altering the original database.

Just to be more clear, the following codes do the same thing:

#Code 1
df.rename(columns={0: 'Grades'}, inplace=True)
#Code 2
df=df.rename(columns={0: 'Grades'}, inplace=False}

Related questions
                            
                                sqlalchemy IS NOT NULL select
                            
                                Efficiently updating database using SQLAlchemy ORM
                            
                                Stopping python using ctrl+c
                            
                                How to print a string at a fixed width?
                            
                                Python syntax for "if a or b or c but not all of them"
                            
                                Common use-cases for pickle in Python
                            
                                Reducing Django Memory Usage. Low hanging fruit?
                            
                                Saving interactive Matplotlib figures
                            
                                Is it possible to modify variable in python that is in outer, but not global, scope?
                            
                                How can I set the aspect ratio in matplotlib?
                            
                                How to set 'auto' for upper limit, but keep a fixed lower limit with matplotlib.pyplot
                            
                                When to use 'raise NotImplementedError'?
                            
                                Pointers in Python?
                            
                                Python - When to use file vs open
                            
                                How to convert an OrderedDict into a regular dict in python3
                            
                                ValueError when checking if variable is None or numpy.array
                            
                                Argparse: Required argument 'y' if 'x' is present
                            
                                Accessing MP3 metadata with Python [closed]
                            
                                How to select rows with NaN in particular column?
                            
                                Python (and Python C API): __new__ versus __init__

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding inplace=True

Tags:

python

pandas

in-place

People also ask

In pandas, is inplace = True considered harmful, or not?

TLDR; Yes, yes it is.

So there is even case when we should use it yet?

Final conclusion:

Recent Activity

Donate For Us