Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

guidelines on using pandas inplace keyword argument

What is the guideline for using inplace?

For example,

df = df.reset_index()

or

df.reset_index(inplace=True)

Same same but different?

like image 956
user3659451 Avatar asked Dec 16 '15 19:12

user3659451


People also ask

Is inplace true faster pandas?

pros of inplace = True :Can be both faster and less memory hogging (the first link shows reset_index() runs twice as fast and uses half the peak memory!).

What is inplace argument in pandas?

Inplace is an argument used in different functions. Some functions in which inplace is used as an attributes like, set_index(), dropna(), fillna(), reset_index(), drop(), replace() and many more. The default value of this attribute is False and it returns the copy of the object. Here we are using fillna() methods.

How do you use pandas inplace?

When inplace = True , the data is modified in place, which means it will return nothing and the dataframe is now updated. When inplace = False , which is the default, then the operation is performed and it returns a copy of the object.

What is the benefit of using inplace parameter in data handling tasks?

Both the in-place and out-of-place versions create a copy of the data anyway, with the in-place version automatically assigning the copy back. Calling a function on a DataFrame column with inplace=True may or may not work.


1 Answers

In terms of the resulting DataFrame df, the two approaches are the same. The difference lies in the (maximum) memory usage, since the in-place version does not create a copy of the DataFrame.

Consider this setup:

import numpy as np
import pandas as pd

def make_data():
    return pd.DataFrame(np.random.rand(1000000, 100))

def func_copy():
    df = make_data()
    df = df.reset_index()
    
def func_inplace():
    df = make_data()
    df.reset_index(inplace=True)

We can use the memory_profiler library to perform some benchmarking for the memory usage:

%load_ext memory_profiler

%memit func_copy()
# peak memory: 1602.66 MiB, increment: 1548.66 MiB

%memit func_inplace()
# peak memory: 817.02 MiB, increment: 762.94 MiB

As expected, the in-place version is more memory efficient.

On the other hand, there also seems to be a non-trivial difference in running time between the approaches when the data size is large enough (e.g. in the above example):

%timeit func_copy()
1 loops, best of 3: 2.56 s per loop

%timeit func_inplace()
1 loops, best of 3: 1.35 s per loop

These differences may or may not be significant depending on the use case (e.g. adhoc exploratory analysis vs. production code), data size and the hardware resource available. In general, it might be a good idea to use the in-place version whenever possible for better memory and run time efficiency.

like image 106
YS-L Avatar answered Sep 21 '22 04:09

YS-L