I'm using Mydia to extract random frames from videos. Because I have a lot of videos, I want to parallelize this workflow while maintaining repeatability. mydia.Videos accepts a random seed, which is important for ensuring repeatability. Now I need to work on the parallelization piece.
Given n videos and a random seed, r, how can I ensure that the extracted frames for each video is the same regardless of the number of workers? I'm particularly interested in the algorithmic component, not necessarily the code.
My initial thought was to use multiprocessing.Pool. However, there will be a race condition in sampling the frames if the processes' completion times are non-determinstic; i.e., if proc 1 takes longer than proc 0, the sampled frames from the Videos class will be different than if proc 0 takes longer than proc 1.
My solution is a bit unorthodox because it's library-specific. Mydia allows to pass the frames to extract in lieu of forcing the Videos client to sample directly. This affords me the opportunity to precalculate the frames to sample in the parent process. By doing this, I can "mock" the randomness in the subprocesses by instantiating a new Videos with those frames. For instance:
class MySampler:
def __init__(self, input_directory: Path, total_frames: int, num_frames: int, fps: int):
self.input_directory = Path(input_directory)
self.frames_per_video = [
self.__get_frame_numbers_for_each_video(total_frames, num_frames, fps)
for _ in self.input_directory.glob("*.mp4")
]
@staticmethod
def get_reader(num_frames: int, frames: List[int]):
# ignores the inputs and returns samples the frames that its constructed with
return Videos(target_size=(512, 512), num_frames=num_frames, mode=lambda *_: frames)
and then I can simply parallelize this:
def sample_frames(self, number_of_workers: int):
pool = Pool(processes=number_of_workers)
videos = list(self.input_directory.glob("*.mp4"))
pool.starmap_async(self.read_video, zip(self.frames_per_video, videos))
pool.close()
pool.join()
where read_video is the method that calls get_reader and does the reading.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With