Using psutil.Process.memory_info memory usage differs from Pandas.memory_usage

Q: How much memory does a Pandas Dataframe use?

Total Memory Usage of Pandas Dataframe with info() To get the full memory usage, we provide memory_usage=”deep” argument to info(). We get all basic information about the dataframe and towards the end we also get the “memory usage: 1.1 MB” for the data frame.

Q: How does Python calculate memory consumption?

You can use it by putting the @profile decorator around any function or method and running python -m memory_profiler myscript. You'll see line-by-line memory usage once your script exits.

Tags:

python

pandas

psutil

I'm profiling a program that makes use of Pandas to process some CSVs. I'm using psutil's Process.memory_info to report the Virtual Memory Size (vms) and the Resident Set Size (rss) values. I'm also using Pandas DataFrame.memory_usage (df.memory_usage().sum()) to report the size of my dataframes in memory.

There's a conflict between the reported vms and df.memory_usage values, where Pandas is reporting more memory just for the dataframe than the Process.memory_info call is reporting for the whole (single-threaded) process.

For example:

rss: 334671872 B

vms: 663515136 B

df.memory_usage().sum(): 670244208 B

The Process.memory_info call is made immediately after the memory_usage call. My expected result was that df.memory_usage < vms at all times, but this doesn't hold up. I assume I'm misinterpreting the meaning of the vms value?

471

asked Oct 14 '19 15:10

musingsole

1 Answers

Here is the reference related to your problem: use rss or vms to track memory. The relationship of RSS and VMS is bit confusing. You can learn about these concepts in detail . You should also know that how to calculate the total memory usage in this and this.

**TO SUMMARIZE AND COMPLEMENT MY OPINION**:

RSS:

Resident set size is used to show how much memory is allocated to a process is in RAM. Remember - It doesn't include memory which is swapped out.

It involves memory from shared libraries, including all stack and heap memory.

VMS:

Virtual memory size includes all memory that the process can access. Which includes;

Memory that is swapped out, memory that is allocated but not used, and memory that is from shared libraries.

Example:

Let's assume, a Process-X has a 500-K binary and is linked to 2500-K of shared libraries, has 200-K of stack/heap allocations of which 100-K is actually in memory (rest is swapped or unused), and it has only actually loaded 1000-K of the shared libraries and 400-K of its own binary then:

RSS: 400K + 1000K + 100K = 1500K
VMS: 500K + 2500K + 200K = 3200K

In this example, since part of the memory is shared, many processes may use it, so if you add up all of the RSS values you can easily end up with more space than your system has.

As you can see when you simple run this;

import os
import psutil
process = psutil.Process(os.getpid())
print("vms: ", process.memory_info().vms)
print("rss:", process.memory_info().rss)

Output:

vms: 7217152

rss: 13975552

By simply adding, import pandas as pd, you can see the difference.

import os
import psutil
import pandas as pd
process = psutil.Process(os.getpid())
print("vms: ", process.memory_info().vms)
print("rss:", process.memory_info().rss)

Here is output:

vms: 276295680

rss: 54116352

So, the memory that is allocated also may not be in RSS until it is actually used by the program. So if your program allocated a bunch of memory up front, then uses it over time;

You could see RSS going up and VMS staying the same.

Now whether you go with df.memory_usage().sum() or Process.memory_info, I believe RSS does include memory from dynamically linked libraries. So the sum of their RSS will be more than the actual memory used.

answered Sep 21 '22 17:09

Muhammad Usman Bashir

Related questions
                            
                                How to pin pipenv requirements with brackets?
                            
                                Prevent script dir from being added to sys.path in Python 3
                            
                                How should I type-hint an integer variable that can also be infinite?
                            
                                pandas.read_csv() can apply different date formats within the same column! Is it a known bug? How to fix it?
                            
                                dtreeviz: from graphviz.backend cannot import name 'run'
                            
                                Deploy Django Channels with Docker
                            
                                Different `grad_fn` for similar looking operations in Pytorch (1.0)
                            
                                Cython: Assigning single element to multidimensional memory view slice
                            
                                How to use pandas .replace() with list of regexs while honoring list order?
                            
                                Why is there a difference between round(x) and round(np.float64(x))?
                            
                                Mask RCNN: How to add region annotation based on manually segmented image?
                            
                                Is there a way for me to see how much volume an application is outputting?
                            
                                How to do groupKfold validation and have balanced data?
                            
                                How to extract tweets location which contain specific keyword using twitter API in Python
                            
                                How to use pytest fixture outside test run?
                            
                                RDKit installation under Windows and Python3.7.4
                            
                                Why doesn't this higher-order function pass static type checking in mypy?
                            
                                Unable to install ansible due to python dependency on Ubuntu 18.04
                            
                                Avoiding module namespace pollution in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With