Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AssertionError: No inf checks were recorded for this optimizer in Pytorch's AutomaticMixedPrecision

I'm using AutomaticMixedPrecision feature of PyTorch to train a network with smaller footprint and precision.
At a certain point some embeddings from the network have NaNs in their tensors, so I'd like to replace those with 0s in order to perform online hard negative samples mining.

However, after replacing the NaNs in the tensor like this:

tensor[torch.isnan(tensor)] = 0

I get the following error while doing the next scaler ste (scaler.step(optimizer):

    assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.

What's the correct way to zero out NaNs while getting rid of this error?

like image 959
Jjang Avatar asked Nov 07 '22 03:11

Jjang


1 Answers

could you show us your full code. Generally it is advisable to just skip the step (batch) if it has NaNs.

Also take a look at torch.nan_to_num.

like image 175
FarisHijazi Avatar answered Nov 11 '22 14:11

FarisHijazi