I'm wondering if there's a significant reduction in memory usage when we choose to manipulate a dataframe in-place (compared to not in-place).
I've done a bit of searching on Stack Overflow and came across this post where the answer states that if an operation is not done in-place, a copy of the dataframe is returned (I guess that's a bit obvious when there's an optional parameter called 'inplace' :P).
If I don't need to keep the original dataframe around, it would be beneficial (and logical) to just modify the dataframe in place right?
Context:
I'm trying to get the top element when sorted by a particular 'column' in the dataframe. I was wondering which of these two is more efficient:
in-place:
df.sort('some_column', ascending=0, inplace=1)
top = df.iloc[0]
vs
copy:
top = df.sort('some_column', ascending=0).iloc[0]
For the 'copy' case, it still allocates memory in making the copy when sorting even though I'm not assigning the copy to a variable right? If so, how long does it take to deallocate that copy from memory?
Thanks for any insights in advance!
Many Pandas operations have an inplace parameter, always defaulting to False, meaning the original DataFrame is untouched, and the operation returns a new DF. When setting inplace = True, the operation might work on the original DF, but it might still work on a copy behind the scenes, and just reassign the reference when done.
The inplace parameter is a pandas dataframe parameter used for a number of methods as listed below: 1 dropna () 2 sort_values () 3 drop_duplicates () 4 query () 5 fillna () 6 reset_index () 7 rename () 8 sort_index () More ...
Yes, in Pandas we have many functions has the parameter inplace but by default it is assigned to False. So, when you do df.dropna (axis='index', how='all', inplace=False) it thinks that you do not want to change the orignial DataFrame, therefore it instead creates a new copy for you with the required changes.
Only when there are no more references to the old array of values will pandas reshape according to the mask. A better rule of thumb is: inplace is available when the operation doesn’t require allocating a new backing ndarray of values. After the df=df.an_operation operation, the old dataframe does not take up space in RAM, does it ?
In general, there is no difference between inplace=True
and returning an explicit copy - in both cases, a copy is created. It just so happens that, in the first case, the data in the copy is copied back into the original df
object, so reassignment is not necessary.
Furthermore, note that as of v0.21
, df.sort
is deprecated, use sort_values
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With