PyTorch: training with GPU gives worse error than training the same thing with CPU

Tags:

I have a next step prediction model on times series which is simply a GRU with a fully-connected layer on top of it. When I train it using CPU after 50 epochs I get a loss of 0.10 but when I train it with GPU the loss is 0.15 after 50 epochs. Doing more epochs doesnt really lower the losses in either cases.

Why is performance after training on CPU better than GPU?

I have tried changing the random seeds for both data and model, and these results are independent of the random seeds.

I have:

Python 3.6.2

PyTorch 0.3.0

CUDNN_MAJOR 7

CUDNN_MINOR 0

CUDNN_PATCHLEVEL 5

Edit:

I also use PyTorch's weight normalizaton torch.nn.utils.weight_norm on the GRU and on the fully-connected layer.

538

asked Jan 25 '18 15:01

patapouf_ai

1 Answers

After trying many things I think I found the problem. Apparently the CUDNN libraries are sub-optimal in PyTorch. I don't know if it is a bug in PyTorch or a bug in CUDNN but doing

torch.backends.cudnn.enabled = False

solves the problem. With the above line, training with GPU or CPU gives the same loss at the same epoch.

Edit:

It seems that it is the interaction of weight normalization and CUDNN which results in things going wrong. If I remove weight normalization it works. If I remove CUDNN it works. It seems that only in combination they do not work in PyTorch.

113

answered Oct 20 '22 01:10

patapouf_ai

Related questions
                            
                                how to save val_loss and val_acc in Keras
                            
                                pip install keep asking user and password
                            
                                Unknown PG numeric type 25
                            
                                AttributeError: 'Series' object has no attribute 'notna'
                            
                                "Installing From Source" Within Anaconda Environment
                            
                                Is python asyncio call_soon_threadsafe really thread-safe?
                            
                                django map widget console showing DjangoGooglePointFieldWidget Uncaught ReferenceError
                            
                                Python: ImportError: No module named _pluggy
                            
                                Python/Bash - Get filenames with escaped characters
                            
                                Are Pyspark and Pandas certified to work together? [closed]
                            
                                Python Pandas average based on condition into new column
                            
                                I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException
                            
                                Matplotlib Table- Assign different text alignments to different columns
                            
                                scikit learn: custom classifier compatible with GridSearchCV
                            
                                How can I overload operators so that type on the left/right does not matter?
                            
                                Sum of distances from a point to all other points
                            
                                OSRM giving wrong response for distance between 2 points
                            
                                Socket Java client - Python Server
                            
                                Understanding Scipy Convolution
                            
                                How to tell if a python module is intended to be python 2 or python 3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PyTorch: training with GPU gives worse error than training the same thing with CPU

Tags:

python

neural-network

gpu

pytorch

cudnn

patapouf_ai

People also ask

1 Answers

patapouf_ai

Recent Activity

Donate For Us