Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is PyTorch 2x slower than Keras for an identical model and hyperparameters?

I've experienced this with custom made modules as well, but for this example I'm specifically using one of the official PyTorch examples and the MNIST dataset.

I've ported the exact architecture in Keras and TF2 with eager mode like so:

model = keras.models.Sequential([ keras.layers.Conv2D(32, (3, 3) , input_shape=(28,28,1), activation='relu'),
                                 keras.layers.Conv2D(64, (3, 3)),
                                 keras.layers.MaxPool2D((2, 2)),
                                 keras.layers.Dropout(0.25),
                                 keras.layers.Flatten(),
                                 keras.layers.Dense(128, activation='relu'),
                                 keras.layers.Dropout(0.5),
                                 keras.layers.Dense(10, activation='softmax')]
                                )

model.summary()

model.compile(optimizer=keras.optimizers.Adadelta(), loss=keras.losses.sparse_categorical_crossentropy, metrics=['accuracy'])

model.fit(train_data,train_labels,batch_size=64,epochs=30,shuffle=True, max_queue_size=1)

The training loop in PyTorch is:

def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()

With me timing every epoch like so:

for epoch in range(1, args.epochs + 1):
    since = time.time()
    train(args, model, device, train_loader, optimizer, epoch)
    # test(args, model, device, test_loader)
    # scheduler.step()
    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))

I have verified that:

  • Both versions use the same Optimizer (AdaDelta)
  • Both versions have around the same number of trainable parameters (1.2 million)
  • I removed the normalization in dataLoader, leaving it to just a toTensor() call.
  • pin_memory is set to True, and num_workers is set to 1 for the PyTorch code.
  • Per the suggestion of Timbus Calin I set max_queue_size to 1 and the results are identical.

The Keras version runs at around 4-5 seconds per epoch while the PyTorch version runs at around 9-10 seconds per epoch.

Why is this and how can I improve this time?

like image 695
Ganea Dan Andrei Avatar asked Feb 02 '20 19:02

Ganea Dan Andrei


1 Answers

I think there is a subtle difference that must be taken into consideration; my best bet/hunch is the following: it is not the processing time in itself per GPU, but the max_queue_size=10 parameter, 10 by default in Keras.

Since by default in the normal for-loop in PyTorch the data is not queued, the queue which Keras benefits from allows the transfer of data from CPU to GPU faster; in essence, there is much less time spent to feed the GPU, since it consumes faster from that internal queue/the overhead of transfering data from CPU to GPU is reduced.

Apart from my former observation, I cannot see any other visible difference, maybe other people can point out new findings.

like image 60
Timbus Calin Avatar answered Oct 22 '22 20:10

Timbus Calin