Understanding Memory Usage by PyTorch DataLoader Workers

Tags:

When running a PyTorch training program with num_workers=32 for DataLoader, htop shows 33 python process each with 32 GB of VIRT and 15 GB of RES.

Does this mean that the PyTorch training is using 33 processes X 15 GB = 495 GB of memory? htop shows only about 50 GB of RAM and 20 GB of swap is being used on the entire machine with 128 GB of RAM. So, how do we explain the discrepancy?

Is there a more accurate way of calculating the total amount of RAM being used by the main PyTorch program and all its child DataLoader worker processes?

Thank you

246

asked Aug 21 '20 12:08

Athena Wisdom

Video Answer

1 Answers

Does this mean that the PyTorch training is using 33 processes X 15 GB = 495 GB of memory?

Not necessary. You have a worker process (with several subprocesses - workers) and the CPU has several cores. One worker usually loads one batch. The next batch can already be loaded and ready to go by the time the main process is ready for another batch. This is the secret for the speeding up.

I guess, you should use far less num_workers.

It would be interesting to know your batch size too, which you can adapt for the training process as well.

Is there a more accurate way of calculating the total amount of RAM being used by the main PyTorch program and all its child DataLoader worker processes?

I was googling but could not find a concrete formula. I think that it is a rough estimation of how many cores has your CPU and Memory and Batch Size.

To choose the num_workers depends on what kind of computer you are using, what kind of dataset you are taking, and how much on-the-fly pre-processing your data requires.

HTH

141

answered Oct 10 '22 09:10

j35t3r

Related questions
                            
                                Python: Sum string lengths
                            
                                100x100 image with random pixel colour
                            
                                Algorithm - How to delete duplicate elements in a list efficiently?
                            
                                Making a clock in kivy
                            
                                Performance differences between Python and C
                            
                                Django Tutorial: name 'HttpResponse' is not defined
                            
                                Python - Way to restart a for loop, similar to "continue" for while loops? [duplicate]
                            
                                Django + mod_wsgi + apache: ImportError at / No module named djproj.urls
                            
                                Can I detect if my code is running on cPython or Jython?
                            
                                How can I compare two lists in python and return not matches
                            
                                What could justify the complexity of Plone?
                            
                                import error in python
                            
                                lat/lon to utm to lat/lon is extremely flawed, how come?
                            
                                Randomly shuffling a dictionary in Python
                            
                                Why is my Python version slower than my Perl version? [closed]
                            
                                Open Source Library for Linguistic Inquiry and Word Count (LIWC) [closed]
                            
                                High-concurrency counters without sharding
                            
                                pyInstaller changing dll and pyd output location
                            
                                Jupyter notebook stuck in pdb mode

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Understanding Memory Usage by PyTorch DataLoader Workers

Tags:

python

python-3.x

ubuntu

deep-learning

pytorch

Athena Wisdom

People also ask

Video Answer

1 Answers

j35t3r

Recent Activity

Donate For Us