Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using click.progressbar with multiprocessing in Python

I have a huge list that I need to process, which takes some time, so I divide it into 4 pieces and multiprocess each piece with some function. It still takes a bit of time to run with 4 cores, so I figured I would add some progress bar to the function, so that it could tell me where each processor is at in processing the list.

My dream was to have something like this:

erasing close atoms, cpu0  [######..............................]  13%
erasing close atoms, cpu1  [#######.............................]  15%
erasing close atoms, cpu2  [######..............................]  13%
erasing close atoms, cpu3  [######..............................]  14%

with each bar moving as the loop in the function progresses. But instead, I get a continuous flow:

enter image description here

etc, filling my terminal window.

Here is the main python script that calls the function:

from eraseCloseAtoms import *
from readPDB import *
import multiprocessing as mp
from vectorCalc import *

prot, cell = readPDB('file')
atoms = vectorCalc(cell)

output = mp.Queue()

# setup mp to erase grid atoms that are too close to the protein (dmin = 2.5A)
cpuNum = 4
tasks = len(atoms)
rangeSet = [tasks / cpuNum for i in range(cpuNum)]
for i in range(tasks % cpuNum):
    rangeSet[i] += 1

rangeSet = np.array(rangeSet)

processes = []
for c in range(cpuNum):
    na, nb = (int(np.sum(rangeSet[:c] + 1)), int(np.sum(rangeSet[:c + 1])))
    processes.append(mp.Process(target=eraseCloseAtoms, args=(prot, atoms[na:nb], cell, 2.7, 2.5, output)))

for p in processes:
    p.start()

results = [output.get() for p in processes]

for p in processes:
    p.join()

atomsNew = results[0] + results[1] + results[2] + results[3]

Below is the function eraseCloseAtoms():

import numpy as np
import click


def eraseCloseAtoms(protein, atoms, cell, spacing=2, dmin=1.4, output=None):
    print 'just need to erase close atoms'

    if dmin > spacing:
        print 'the spacing needs to be larger than dmin'
        return

    grid = [int(cell[0] / spacing), int(cell[1] / spacing), int(cell[2] / spacing)]

    selected = list(atoms)
    with click.progressbar(length=len(atoms), label='erasing close atoms') as bar:
        for i, atom in enumerate(atoms):
            bar.update(i)
            erased = False
            coord = np.array(atom[6])

            for ix in [-1, 0, 1]:
                if erased:
                    break
                for iy in [-1, 0, 1]:
                    if erased:
                        break
                    for iz in [-1, 0, 1]:
                        if erased:
                            break
                        for j in protein:
                            protCoord = np.array(protein[int(j)][6])
                            trueDist = getMinDist(protCoord, coord, cell, vectors)
                            if trueDist <= dmin:
                                selected.remove(atom)
                                erased = True
                                break
    if output is None:
        return selected
    else:
        output.put(selected)
like image 772
sodiumnitrate Avatar asked Aug 18 '15 18:08

sodiumnitrate


People also ask

Does TQDM work with multiprocessing?

Using queues, tqdm-multiprocess supports multiple worker processes, each with multiple tqdm progress bars, displaying them cleanly through the main process.

What is Chunksize in multiprocessing?

It is the single execution of the function specified with the func -parameter of a Pool -method, called with arguments obtained from a single element of the transmitted chunk. A task consists of chunksize taskels.

Is Imap_unordered faster?

That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values.

Is it possible to show progress bars in Python?

It is natural that we would like to employ progress bars in our programs to show the progress of tasks. tqdm is one of my favorite progressing bar tools in Python. It could be easily incorporated to Python using trange to replace range or using tqdm.tqdm to wrap iterators, in order to show progress bars for a for loop.

How do I create a simple progress bar in Java?

Let’s take a look at some of them. A simple progress bar that is filled with hash. We first import the Bar class from progress.bar module and create its object. We supply the prefix 'Processing...' which will be added to the front of our progress bar. To update the progress bar, we use the next () method at the end of each iteration.

What is multiprocessing in Python?

First of all, multiprocessing is a native python package and does not require additional installation. In addition, we need to write the task that we want to multi-processing as a function.

How do I update the progress bar of a process?

We supply the prefix 'Processing...' which will be added to the front of our progress bar. To update the progress bar, we use the next () method at the end of each iteration. There are multiple arguments available to us like fill and suffix. You can find details regarding them here. Let’s try modifying the fill from # to @ and also look at the ETA.


3 Answers

accepted answer says it's impossible with click and it'd require 'non trivial amount of code to make it work'.

While it's true, there is another module with this functionality out of the box: tqdm https://github.com/tqdm/tqdm which does exatly what you need.

You can do nested progress bars in docs https://github.com/tqdm/tqdm#nested-progress-bars etc.

like image 121
Łukasz Rysiak Avatar answered Oct 10 '22 22:10

Łukasz Rysiak


For anybody coming to this later. I created this which seems to work okay. It overrides click.ProgressBar fairly minimally, although I had to override an entire method for only a few lines of code at the bottom of the method. This is using \x1b[1A\x1b[2K to clear the progress bars before rewriting them so may be environment dependent.

#!/usr/bin/env python
import time
from typing import Dict

import click
from click._termui_impl import ProgressBar as ClickProgressBar, BEFORE_BAR
from click._compat import term_len


class ProgressBar(ClickProgressBar):
    def render_progress(self, in_collection=False):
        # This is basically a copy of the default render_progress with the addition of in_collection
        # param which is only used at the very bottom to determine how to echo the bar
        from click.termui import get_terminal_size

        if self.is_hidden:
            return

        buf = []
        # Update width in case the terminal has been resized
        if self.autowidth:
            old_width = self.width
            self.width = 0
            clutter_length = term_len(self.format_progress_line())
            new_width = max(0, get_terminal_size()[0] - clutter_length)
            if new_width < old_width:
                buf.append(BEFORE_BAR)
                buf.append(" " * self.max_width)
                self.max_width = new_width
            self.width = new_width

        clear_width = self.width
        if self.max_width is not None:
            clear_width = self.max_width

        buf.append(BEFORE_BAR)
        line = self.format_progress_line()
        line_len = term_len(line)
        if self.max_width is None or self.max_width < line_len:
            self.max_width = line_len

        buf.append(line)
        buf.append(" " * (clear_width - line_len))
        line = "".join(buf)
        # Render the line only if it changed.

        if line != self._last_line and not self.is_fast():
            self._last_line = line
            click.echo(line, file=self.file, color=self.color, nl=in_collection)
            self.file.flush()
        elif in_collection:
            click.echo(self._last_line, file=self.file, color=self.color, nl=in_collection)
            self.file.flush()


class ProgressBarCollection(object):
    def __init__(self, bars: Dict[str, ProgressBar], bar_template=None, width=None):
        self.bars = bars
        if bar_template or width:
            for bar in self.bars.values():
                if bar_template:
                    bar.bar_template = bar_template
                if width:
                    bar.width = width

    def __enter__(self):
        self.render_progress()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.render_finish()

    def render_progress(self, clear=False):
        if clear:
            self._clear_bars()
        for bar in self.bars.values():
            bar.render_progress(in_collection=True)

    def render_finish(self):
        for bar in self.bars.values():
            bar.render_finish()

    def update(self, bar_name: str, n_steps: int):
        self.bars[bar_name].make_step(n_steps)
        self.render_progress(clear=True)

    def _clear_bars(self):
        for _ in range(0, len(self.bars)):
            click.echo('\x1b[1A\x1b[2K', nl=False)


def progressbar_collection(bars: Dict[str, ProgressBar]):
    return ProgressBarCollection(bars, bar_template="%(label)s  [%(bar)s]  %(info)s", width=36)


@click.command()
def cli():
    with click.progressbar(length=10, label='bar 0') as bar:
        for i in range(0, 10):
            time.sleep(1)
            bar.update(1)
    click.echo('------')
    with ProgressBar(iterable=None, length=10, label='bar 1', bar_template="%(label)s  [%(bar)s]  %(info)s") as bar:
        for i in range(0, 10):
            time.sleep(1)
            bar.update(1)
    click.echo('------')
    bar2 = ProgressBar(iterable=None, length=10, label='bar 2')
    bar3 = ProgressBar(iterable=None, length=10, label='bar 3')
    with progressbar_collection({'bar2': bar2, 'bar3': bar3}) as bar_collection:
        for i in range(0, 10):
            time.sleep(1)
            bar_collection.update('bar2', 1)
        for i in range(0, 10):
            time.sleep(1)
            bar_collection.update('bar3', 1)


if __name__ == "__main__":
    cli()
like image 36
Khrall Avatar answered Oct 10 '22 22:10

Khrall


I see two issues in your code.

The first one explains why your progress bars are often showing 100% rather than their real progress. You're calling bar.update(i) which advances the bar's progress by i steps, when I think you want to be updating by one step. A better approach would be to pass the iterable to the progressbar function and let it do the updating automatically:

with click.progressbar(atoms, label='erasing close atoms') as bar:
    for atom in bar:
        erased = False
        coord = np.array(atom[6])

        # ...

However, this still won't work with multiple processes iterating at once, each with its own progress bar due to the second issue with your code. The click.progressbar documentation states the following limitation:

No printing must happen or the progress bar will be unintentionally destroyed.

This means that whenever one of your progress bars updates itself, it will break all of the other active progress bars.

I don't think there is an easy fix for this. It's very hard to interactively update a multiple-line console output (you basically need to be using curses or a similar "console GUI" library with support from your OS). The click module does not have that capability, it can only update the current line. Your best hope would probably be to extend the click.progressbar design to output multiple bars in columns, like:

CPU1: [######      ] 52%   CPU2: [###        ] 30%    CPU3: [########  ] 84%

This would require a non-trivial amount of code to make it work (especially when the updates are coming from multiple processes), but it's not completely impractical.

like image 33
Blckknght Avatar answered Oct 10 '22 22:10

Blckknght