Load .npy file with np.load progress bar

Tags:

I have a really large .npy file (previously saved with np.save) and I am loading it with:

np.load(open('file.npy'))

Is there any way to see the progress of the loading process? I know tqdm and some other libraries for monitoring the progress but don't how to use them for this problem.

Thank you!

977

asked Mar 09 '17 09:03

serchu

1 Answers

As far I am aware, np.load does not provide any callbacks or hooks to monitor progress. However, there is a work around which may work: np.load can open the file as a memory-mapped file, which means the data stays on disk and is loaded into memory only on demand. We can abuse this machinery to manually copy the data from the memory mapped file into actual memory using a loop whose progress can be monitored.

Here is an example with a crude progress monitor:

import numpy as np

x = np.random.randn(8096, 4096)
np.save('file.npy', x)

blocksize = 1024  # tune this for performance/granularity

try:
    mmap = np.load('file.npy', mmap_mode='r')
    y = np.empty_like(mmap)
    n_blocks = int(np.ceil(mmap.shape[0] / blocksize))
    for b in range(n_blocks):
        print('progress: {}/{}'.format(b, n_blocks))  # use any progress indicator
        y[b*blocksize : (b+1) * blocksize] = mmap[b*blocksize : (b+1) * blocksize]
finally:
    del mmap  # make sure file is closed again

assert np.all(y == x)

Plugging any progress-bar library into the loop should be straight forward.

I was unable to test this with exceptionally large arrays due to memory constraints, so I can't really tell if this approach has any performance issues.

answered Sep 21 '22 10:09

MB-F

Related questions
                            
                                Stop an infinite while loop repeatedly invoking os.system
                            
                                Expose multiple backends with multiple IPs with Kubernetes Ingress resources
                            
                                Groovy closures and overloaded methods with functional parameters
                            
                                How to Append and Format Row at same time Google sheet API
                            
                                How to use C# 7 within Web Application ASPX code-before pages?
                            
                                keypress and keydown take priority over paste event in Firefox & Safari
                            
                                How to add dimension to a tensor using Tensorflow
                            
                                why a Chinese character takes one char (2 bytes) but 3 bytes?
                            
                                The TensorFlow library wasn't compiled to use SSE3, SSE4.1, SSE4.2, AVX on Google Cloud Platform Console
                            
                                How to detect when a file is being sourced from bash [duplicate]
                            
                                Escaping in a Bash extended pattern @(..)
                            
                                import json file to create a network in vis.js

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With