Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas manipulating a DataFrame inplace vs not inplace (inplace=True vs False) [duplicate]

I'm wondering if there's a significant reduction in memory usage when we choose to manipulate a dataframe in-place (compared to not in-place).

I've done a bit of searching on Stack Overflow and came across this post where the answer states that if an operation is not done in-place, a copy of the dataframe is returned (I guess that's a bit obvious when there's an optional parameter called 'inplace' :P).

If I don't need to keep the original dataframe around, it would be beneficial (and logical) to just modify the dataframe in place right?

Context:

I'm trying to get the top element when sorted by a particular 'column' in the dataframe. I was wondering which of these two is more efficient:

in-place:

df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]

vs

copy:

top = df.sort('some_column', ascending=0).iloc[0]

For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?

Thanks for any insights in advance!

like image 311
tooesy Avatar asked Nov 12 '17 04:11

tooesy


People also ask

What does inplace = true mean in pandas?

Many Pandas operations have an inplace parameter, always defaulting to False, meaning the original DataFrame is untouched, and the operation returns a new DF. When setting inplace = True, the operation might work on the original DF, but it might still work on a copy behind the scenes, and just reassign the reference when done.

What is inplace parameter in pandas Dataframe?

The inplace parameter is a pandas dataframe parameter used for a number of methods as listed below: 1 dropna () 2 sort_values () 3 drop_duplicates () 4 query () 5 fillna () 6 reset_index () 7 rename () 8 sort_index () More ...

Is it possible to change the index of a pandas Dataframe?

Yes, in Pandas we have many functions has the parameter inplace but by default it is assigned to False. So, when you do df.dropna (axis='index', how='all', inplace=False) it thinks that you do not want to change the orignial DataFrame, therefore it instead creates a new copy for you with the required changes.

When should pandas reshape the Dataframe According to the mask?

Only when there are no more references to the old array of values will pandas reshape according to the mask. A better rule of thumb is: inplace is available when the operation doesn’t require allocating a new backing ndarray of values. After the df=df.an_operation operation, the old dataframe does not take up space in RAM, does it ?


1 Answers

In general, there is no difference between inplace=True and returning an explicit copy - in both cases, a copy is created. It just so happens that, in the first case, the data in the copy is copied back into the original df object, so reassignment is not necessary.

Furthermore, note that as of v0.21, df.sort is deprecated, use sort_values instead.

like image 65
cs95 Avatar answered Oct 16 '22 04:10

cs95