I have written the PyTorch code for the fit
function of my network. But when I use tqdm
in the loop within it, it does not increase from 0% the reason for which I am unable to understand.
Here is the code:
from tqdm.notebook import tqdm
def fit(model, train_dataset, val_dataset, epochs=1, batch_size=32, warmup_prop=0, lr=5e-5):
device = torch.device('cuda:1')
model.to(device)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
optimizer = AdamW(model.parameters(), lr=lr)
num_warmup_steps = int(warmup_prop * epochs * len(train_loader))
num_training_steps = epochs * len(train_loader)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps)
loss_fct = nn.BCEWithLogitsLoss(reduction='mean').to(device)
for epoch in range(epochs):
model.train()
start_time = time.time()
optimizer.zero_grad()
avg_loss = 0
for step, (x, y_batch) in tqdm(enumerate(train_loader), total=len(train_loader)):
y_pred = model(x.to(device))
loss = loss_fct(y_pred.view(-1).float(), y_batch.float().to(device))
loss.backward()
avg_loss += loss.item() / len(train_loader)
optimizer.step()
scheduler.step()
model.zero_grad()
optimizer.zero_grad()
model.eval()
preds = []
truths = []
avg_val_loss = 0.
with torch.no_grad():
for x, y_batch in val_loader:
y_pred = model(x.to(device))
loss = loss_fct(y_pred.detach().view(-1).float(), y_batch.float().to(device))
avg_val_loss += loss.item() / len(val_loader)
probs = torch.sigmoid(y_pred).detach().cpu().numpy()
preds += list(probs.flatten())
truths += list(y_batch.numpy().flatten())
score = roc_auc_score(truths, preds)
dt = time.time() - start_time
lr = scheduler.get_last_lr()[0]
print(f'Epoch {epoch + 1}/{epochs} \t lr={lr:.1e} \t t={dt:.0f}s \t loss={avg_loss:.4f} \t val_loss={avg_val_loss:.4f} \t val_auc={score:.4f}')
Output
The output after executing the fit
function with the required parameters looks like this:0%| | 0/6986 [00:00<?, ?it/s]
How to fix this?
As you are importing from tqdm.notebook
it means that you're using Jupyter notebook, right? If not you have to do from tqdm import tqdm
.
I simplified your example code to make it really minimal, like this:
import time
from tqdm.notebook import tqdm
l = [None] * 10000
for i, e in tqdm(enumerate(l), total = len(l)):
time.sleep(0.01)
and executed on Google Colab jupyter notebook. It showed me nice progress bar like this:
So it means tqdm
works in notebook mode correctly. Hence you have some problem with your iterable or loop code, not with tqdm. Possible reason could be that your inner loop takes to long time so even 1 iteration (out of total 6986 in your case) takes forever and is not showed in progress bar.
One more reason is that your iterable takes forever to produce second element, also you have to check if it works.
Also I see you showed us ASCII progress bar, which is not the one that is usually shown in Notebook (notebook shows graphical bar usually). So maybe you're not inside notebook at all? Then you have to do from tqdm import tqdm
instead of from tqdm.notebook import tqdm
.
Also first try to simplify your code, just temporarily, to figure out if reason was really with tqdm
module in your case and not with your iterable or loop code. Try starting from my code provided above.
Also instead of tqdm try with just printing something like print(step)
inside your loop, does it print at least two lines on the screen?
If in my code I do from tqdm import tqdm
and then executing it in console Python then I get:
10%|███████████▉ | 950/10000 [00:14<02:20, 64.37it/s]
which means that console version works too.
This can happen in Jupyter if the notebook is not trusted - if that's the case, click on "Not Trusted" box in the upper right corner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With