I'm using AutomaticMixedPrecision feature of PyTorch to train a network with smaller footprint and precision.
At a certain point some embeddings from the network have NaNs in their tensors, so I'd like to replace those with 0s in order to perform online hard negative samples mining.
However, after replacing the NaNs in the tensor like this:
tensor[torch.isnan(tensor)] = 0
I get the following error while doing the next scaler ste (scaler.step(optimizer):
assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were recorded for this optimizer."
AssertionError: No inf checks were recorded for this optimizer.
What's the correct way to zero out NaNs while getting rid of this error?
could you show us your full code. Generally it is advisable to just skip the step (batch) if it has NaNs.
Also take a look at torch.nan_to_num.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With