Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I annotate a numpy array when saving it with savez

Tags:

python

numpy

Suppose my program creates a large array of data which I then save with numpy's savez routine. However, I'd also like to store some additional information together with that array. Examples would be the git commit id of the current version, and the input parameters used to generate the data so that later I can look at the data and know exactly how I created it.

Is there a way to save this information directly together with the array in a npz file, or would I have to create a separate file?

like image 504
Lagerbaer Avatar asked Jun 27 '12 18:06

Lagerbaer


People also ask

How do I save and read a NumPy array?

You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format.

How do I save an array image in NumPy?

Create a sample Numpy array and convert the array to PIL format using fromarray() function of PIL. This will open a new terminal window where the image will be displayed. To save the Numpy array as a local image, use the save() function and pass the image filename with the directory where to save it.

Does NP save overwrite?

Save cannot be appended, that is, every time np. savetxt() overwrites the previous content.


2 Answers

In a nutshell, you can (.npz is just a pickled dict) but you're probably better off switching to something else. (It looks like @JoshAdel just posted a nice example of doing this if you do want to stick with .npz.)

HDF is a far better choice for something like this.

Each group or dataset in an hdf file can store attributes.

I'd reccommend h5py for storing numpy arrays in an hdf file.

As an example:

import numpy as np
import h5py

somearray = np.random.random(100)

f = h5py.File('test.hdf', 'w')

dataset = f.create_dataset('my_data', data=somearray)

# Store attributes about your dataset using dictionary-like access
dataset.attrs['git id'] = 'yay this is a string'

f.close()
like image 179
Joe Kington Avatar answered Sep 30 '22 14:09

Joe Kington


You should be able to:

In [2]: a = np.arange(10)

In [3]: b = 'git push'

In [5]: np.savez('file',a=a,b=b)

In [7]: data = np.load('file.npz')

In [8]: data.keys()
Out[8]: ['a', 'b']

In [9]: data['a']
Out[9]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]: str(data['b'])
Out[10]: 'git push'

So you can save arbitrary named data and get a dictionary-like object out. Perhaps a better format to use that may be more flexible and has built in support for all sorts of metadata is hdf5 using either h5py or pytables:

http://h5py.alfven.org/docs/

http://www.pytables.org/

like image 25
JoshAdel Avatar answered Sep 30 '22 15:09

JoshAdel