Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print the progress of a list comprehension in python?

In my method i have to return a list within a list. I would like to have a list comprehension, because of the performance since the list takes about 5 minutes to create.

[[token.text for token in document] for document in doc_collection]

Is there a possibility to print out the progress, in which document the create-process currently are? Something like that:

[[token.text for token in document] 
  and print(progress) for progress, document in enumerate(doc_collection)]

Thanks for your help!

like image 215
rakael Avatar asked Jun 08 '18 08:06

rakael


3 Answers

tqdm

Using the tqdm package, a fast and versatile progress bar utility

pip install tqdm
from tqdm import tqdm

def process(token):
    return token['text']

l1 = [{'text': k} for k in range(5000)]
l2 = [process(token) for token in tqdm(l1)]
100%|███████████████████████████████████| 5000/5000 [00:00<00:00, 2326807.94it/s]

No requirement

1/ Use a side function

def report(index):
    if index % 1000 == 0:
        print(index)

def process(token, index, report=None):
    if report:
        report(index) 
    return token['text']

l1 = [{'text': k} for k in range(5000)]

l2 = [process(token, i, report) for i, token in enumerate(l1)]

2/ Use and and or statements

def process(token):
    return token['text']

l1 = [{'text': k} for k in range(5000)]
l2 = [(i % 1000 == 0 and print(i)) or process(token) for i, token in enumerate(l1)]

3/ Use both

def process(token):
    return token['text']

def report(i):
    i % 1000 == 0 and print(i)

l1 = [{'text': k} for k in range(5000)]
l2 = [report(i) or process(token) for i, token in enumerate(l1)]

All 3 methods print:

0
1000
2000
3000
4000

How 2 works

  • i % 1000 == 0 and print(i): and only checks the second statement if the first one is True so only prints when i % 1000 == 0
  • or process(token): or always checks both statements, but returns the first one which evals to True.
    • If i % 1000 != 0 then the first statement is False and process(token) is added to the list.
    • Else, then the first statement is None (because print returns None) and likewise, the or statement adds process(token) to the list

How 3 works

Similarly as 2, because report(i) does not return anything, it evals to None and or adds process(token) to the list

like image 163
ted Avatar answered Oct 21 '22 18:10

ted


doc_collection = [[1, 2],
                  [3, 4],
                  [5, 6]]

result = [print(progress) or
          [str(token) for token in document]
          for progress, document in enumerate(doc_collection)]

print(result)  # [['1', '2'], ['3', '4'], ['5', '6']]

I don't consider this good or readable code, but the idea is fun.

It works because print always returns None so print(progress) or x will always be x (by the definition of or).

like image 38
Alex Hall Avatar answered Oct 21 '22 17:10

Alex Hall


Just do:

from time import sleep
from tqdm import tqdm

def foo(i):
    sleep(0.01)
    return i

[foo(i) for i in tqdm(range(1000))]

For Jupyter notebook:

from tqdm.notebook import tqdm
like image 3
noyk Avatar answered Oct 21 '22 17:10

noyk