Whenever I export a fastai model and reload it, I get this error (or a very similar one) when I try and use the reloaded model to generate predictions on a new test set:
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
Minimal reprodudeable code example below, you just need to update your FILES_DIR
variable to where the MNIST data gets deposited on your system:
from fastai import *
from fastai.vision import *
# download data for reproduceable example
untar_data(URLs.MNIST_SAMPLE)
FILES_DIR = '/home/mepstein/.fastai/data/mnist_sample' # this is where command above deposits the MNIST data for me
# Create FastAI databunch for model training
tfms = get_transforms()
tr_val_databunch = ImageDataBunch.from_folder(path=FILES_DIR, # location of downloaded data shown in log of prev command
train = 'train',
valid_pct = 0.2,
ds_tfms = tfms).normalize()
# Create Model
conv_learner = cnn_learner(tr_val_databunch,
models.resnet34,
metrics=[error_rate]).to_fp16()
# Train Model
conv_learner.fit_one_cycle(4)
# Export Model
conv_learner.export() # saves model as 'export.pkl' in path associated with the learner
# Reload Model and use it for inference on new hold-out set
reloaded_model = load_learner(path = FILES_DIR,
test = ImageList.from_folder(path = f'{FILES_DIR}/valid'))
preds = reloaded_model.get_preds(ds_type=DatasetType.Test)
Output:
"RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same"
Stepping through the code statement by statement, everything works fine until the last line pred = ...
which is where the torch error above pops up.
Relevant software versions:
Python 3.7.3
fastai 1.0.57
torch 1.2.0
torchvision 0.4.0
So the answer to this ended up being relatively simple:
1) As noted in my comment, training in mixed precision mode (setting conv_learner
to_fp16()
) caused the error with the exported/reloaded model
2) To train in mixed precision mode (which is faster than regular training) and enable export/reload of the model without errors, simply set the model back to default precision before exporting.
...In code, simply changing the example above:
# Export Model
conv_learner.export()
to:
# Export Model (after converting back to default precision for safe export/reload
conv_learner = conv_learner.to_fp32()
conv_learner.export()
...and now the full (reproduceable) code example above runs without errors, including the prediction after model reload.
Your model is in half precision if you have .to_fp16
, which would be the same if you would model.half()
in PyTorch.
Actually if you trace the code .to_fp16
will call model.half()
But there is a problem. If you convert the batch norm layer also to half precision you may get the convergence problem.
This is why you would typically do this in PyTorch:
model.half() # convert to half precision
for layer in model.modules():
if isinstance(module, torch.nn.modules.batchnorm._BatchNorm):
layer.float()
This will convert any layer to half precision other than batch norm.
Note that code from PyTorch forum is also OK, but just for nn.BatchNorm2d
.
Then make sure your input is in half precision using to()
like this:
import torch
t = torch.tensor(10.)
print(t)
print(t.dtype)
t=t.to(dtype=torch.float16)
print(t)
print(t.dtype)
# tensor(10.)
# torch.float32
# tensor(10., dtype=torch.float16)
# torch.float16
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With