Suppose my program creates a large array of data which I then save with numpy's savez routine. However, I'd also like to store some additional information together with that array. Examples would be the git commit id of the current version, and the input parameters used to generate the data so that later I can look at the data and know exactly how I created it.
Is there a way to save this information directly together with the array in a npz file, or would I have to create a separate file?
You can save your NumPy arrays to CSV files using the savetxt() function. This function takes a filename and array as arguments and saves the array into CSV format.
Create a sample Numpy array and convert the array to PIL format using fromarray() function of PIL. This will open a new terminal window where the image will be displayed. To save the Numpy array as a local image, use the save() function and pass the image filename with the directory where to save it.
Save cannot be appended, that is, every time np. savetxt() overwrites the previous content.
In a nutshell, you can (.npz
is just a pickled dict) but you're probably better off switching to something else. (It looks like @JoshAdel just posted a nice example of doing this if you do want to stick with .npz
.)
HDF is a far better choice for something like this.
Each group or dataset in an hdf file can store attributes.
I'd reccommend h5py
for storing numpy arrays in an hdf file.
As an example:
import numpy as np
import h5py
somearray = np.random.random(100)
f = h5py.File('test.hdf', 'w')
dataset = f.create_dataset('my_data', data=somearray)
# Store attributes about your dataset using dictionary-like access
dataset.attrs['git id'] = 'yay this is a string'
f.close()
You should be able to:
In [2]: a = np.arange(10)
In [3]: b = 'git push'
In [5]: np.savez('file',a=a,b=b)
In [7]: data = np.load('file.npz')
In [8]: data.keys()
Out[8]: ['a', 'b']
In [9]: data['a']
Out[9]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [10]: str(data['b'])
Out[10]: 'git push'
So you can save arbitrary named data and get a dictionary-like object out. Perhaps a better format to use that may be more flexible and has built in support for all sorts of metadata is hdf5 using either h5py or pytables:
http://h5py.alfven.org/docs/
http://www.pytables.org/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With