Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the advantage of saving `.npz` files instead of `.npy` in python, regarding speed, memory and look-up?

Tags:

The python documentation for the numpy.savez which saves an .npz file is:

The .npz file format is a zipped archive of files named after the variables they contain. The archive is not compressed and each file in the archive contains one variable in .npy format. [...]

When opening the saved .npz file with load a NpzFile object is returned. This is a dictionary-like object which can be queried for its list of arrays (with the .files attribute), and for the arrays themselves.

My question is: what is the point of numpy.savez?

Is it just a more elegant version (shorter command) to save multiple arrays, or is there a speed-up in the saving/reading process? Does it occupy less memory?

like image 846
SuperCiocia Avatar asked Jan 17 '19 15:01

SuperCiocia


People also ask

What is the difference between Npz and NPY?

An . npy file contains a single numpy array, stored in a binary format along with its shape, data type, etc. An . npz file contains a collection numpy arrays each encoded in the .

What is Npz file in Python?

npz file format is a zipped archive of files named after the variables they contain. The archive is not compressed and each file in the archive contains one variable in . npy format. For a description of the . npy format, see format.

What is .NPY files and why you should use them?

NPY files store all the information required to reconstruct an array on any computer, which includes dtype and shape information. NumPy is a Python programming language library that provides support for large arrays and matrices. You can export an array to an NPY file by using np. save('filename.

What is NumPy Npz file?

NPZ is a file format by numpy that provides storage of array data using gzip compression. This imageio plugin supports data of any shape, and also supports multiple images per file.


1 Answers

There are two parts of explanation for answering your question.

I. NPY vs. NPZ

As we already read from the doc, the .npy format is:

the standard binary file format in NumPy for persisting a single arbitrary NumPy array on disk. ... The format is designed to be as simple as possible while achieving its limited goals. (sources)

And .npz is only a

simple way to combine multiple arrays into a single file, one can use ZipFile to contain multiple “.npy” files. We recommend using the file extension “.npz” for these archives. (sources)

So, .npz is just a ZipFile containing multiple “.npy” files. And this ZipFile can be either compressed (by using np.savez_compressed) or uncompressed (by using np.savez).

It's similar to tarball archive file in Unix-like system, where a tarball file can be just an uncompressed archive file which containing other files or a compressed archive file by combining with various compression programs (gzip, bzip2, etc.)

II. Different APIs for binary serialization

And Numpy also provides different APIs to produce these binary file output:

  • np.save ---> Save an array to a binary file in NumPy .npy format
  • np.savez --> Save several arrays into a single file in uncompressed .npz format
  • np.savez_compressed --> Save several arrays into a single file in compressed .npz format
  • np.load --> Load arrays or pickled objects from .npy, .npz or pickled files

If we skim the source code of Numpy, under the hood:

def _savez(file, args, kwds, compress, allow_pickle=True, pickle_kwargs=None):     ...     if compress:         compression = zipfile.ZIP_DEFLATED     else:         compression = zipfile.ZIP_STORED     ...   def savez(file, *args, **kwds):     _savez(file, args, kwds, False)   def savez_compressed(file, *args, **kwds):     _savez(file, args, kwds, True) 

Then back to the question:

  • If only use np.save, there is no more compression on top of the .npy format, only just a single archive file for the convenience of managing multiple related files.
  • If use np.savez_compressed, then of course less memory on disk because of more CPU time to do the compression job (i.e. a bit slower).
like image 133
YaOzI Avatar answered Sep 19 '22 14:09

YaOzI