I have some kind of high level code, so model training and etc. are wrapped by <code>pipeline_network</code> class. My main goal is to train new model every new fold. <pre class="prettyprint"><code>for train_idx, valid_idx in cv.split(meta_train[DEPTH_COLUMN].values.reshape(-1)): meta_train_split, meta_valid_split = meta_train.iloc[train_idx], meta_train.iloc[valid_idx] pipeline_network = unet(config=CONFIG, suffix = 'fold' + str(fold), train_mode=True) </code></pre> But then I move on to 2nd fold everything fails out of gpu memory: <pre class="prettyprint"><code>RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58 </code></pre> At the end of epoch I tried to manually delete that pipeline with no luck: <pre class="prettyprint"><code> def clean_object_from_memory(obj): #definition del obj gc.collect() torch.cuda.empty_cache() clean_object_from_memory( clean_object_from_memory) # calling </code></pre> Calling this didn't help as well: <pre class="prettyprint"><code>def dump_tensors(gpu_only=True): torch.cuda.empty_cache() total_size = 0 for obj in gc.get_objects(): try: if torch.is_tensor(obj): if not gpu_only or obj.is_cuda: del obj gc.collect() elif hasattr(obj, "data") and torch.is_tensor(obj.data): if not gpu_only or obj.is_cuda: del obj gc.collect() except Exception as e: pass </code></pre> How can reset pytorch then I move on to the next fold?

Try delete the object with <code>del</code> and then apply <code>torch.cuda.empty_cache()</code>. The reusable memory will be freed after this operation.

How to free up all memory pytorch is taken from gpu memory

Tags:

python

python-3.x

out-of-memory

gpu

pytorch

I have some kind of high level code, so model training and etc. are wrapped by pipeline_network class. My main goal is to train new model every new fold.

for train_idx, valid_idx in cv.split(meta_train[DEPTH_COLUMN].values.reshape(-1)):

        meta_train_split, meta_valid_split = meta_train.iloc[train_idx], meta_train.iloc[valid_idx]

        pipeline_network = unet(config=CONFIG, suffix = 'fold' + str(fold), train_mode=True)

But then I move on to 2nd fold everything fails out of gpu memory:

RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

At the end of epoch I tried to manually delete that pipeline with no luck:

 def clean_object_from_memory(obj): #definition
    del obj
    gc.collect()
    torch.cuda.empty_cache()

clean_object_from_memory( clean_object_from_memory) # calling

Calling this didn't help as well:

def dump_tensors(gpu_only=True):
        torch.cuda.empty_cache()
        total_size = 0
        for obj in gc.get_objects():
            try:
                if torch.is_tensor(obj):
                    if not gpu_only or obj.is_cuda:
                        del obj
                        gc.collect()
                elif hasattr(obj, "data") and torch.is_tensor(obj.data):
                    if not gpu_only or obj.is_cuda:
                        del obj
                        gc.collect()
            except Exception as e:
                pass

How can reset pytorch then I move on to the next fold?

253

asked Sep 06 '18 13:09

Rocketq

1 Answers

Try delete the object with del and then apply torch.cuda.empty_cache(). The reusable memory will be freed after this operation.

112

answered Oct 11 '22 12:10

HzCheng

Related questions
                            
                                How to use modern string formatting options with Python's logging module?
                            
                                Why does re.findall() find more matches than re.sub()?
                            
                                sqlalchemy bulk update performance problems
                            
                                python read_fwf error: 'dtype is not supported with python-fwf parser'
                            
                                matplotlib.pyplot vs matplotlib.pylab
                            
                                Is a Scripts directory an anti-pattern in Python? If so, what's the right way to import?
                            
                                How to create a django ViewFlow process programmatically
                            
                                Spark Java Error: Size exceeds Integer.MAX_VALUE
                            
                                PyCharm warns for unresolved reference builtin datetime module
                            
                                How to use botocore.response.StreamingBody as stdin PIPE
                            
                                How to write unit tests for python tornado application?
                            
                                How to create a pydub AudioSegment using an numpy array?
                            
                                Square root of complex numbers in python
                            
                                How to get feature names selected by feature elimination in sklearn pipeline?
                            
                                Why doesn't groupby sum convert boolean to int or float?
                            
                                NaNs when subtracting dataframes pandas
                            
                                Scikit K-means clustering performance measure
                            
                                Google Cloud - Compute Engine VS Machine Learning
                            
                                What are the differences between bool() and operator.truth()?
                            
                                Flow visualisation in python using curved (path-following) vectors

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With