Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add metadata comment to Numpy ndarray

I have a Numpy ndarray of three large arrays and I'd just like to store the path to the file that generated the data in there somewhere. Some toy data:

A = array([[  6.52479351e-01,   6.54686928e-01,   6.56884432e-01, ...,
              2.55901861e+00,   2.56199503e+00,   2.56498647e+00],
           [             nan,              nan,   9.37914686e-17, ...,
              1.01366425e-16,   3.20371075e-16,  -6.33655223e-17],
           [             nan,              nan,   8.52057308e-17, ...,
              4.26943463e-16,   1.51422386e-16,   1.55097437e-16]],                 
           dtype=float32)

I can't just append it as an array to the ndarray because it needs to be the same length as the other three.

I could just append np.zeros(len(A[0])) and make the first value the string so that I can retrieve it with A[-1][0] but that seems ridiculous.

Is there some metadata key I can use to store a string like /Documents/Data/foobar.txt' so I can retrieve it with something like A.metadata.comment?

Thanks!

like image 645
Joe Flip Avatar asked Jan 23 '16 18:01

Joe Flip


2 Answers

TobiasR's comment is the simplest way, but you could also subclass ndarray. See numpy documentation or this question

class MetaArray(np.ndarray):
    """Array with metadata."""

    def __new__(cls, array, dtype=None, order=None, **kwargs):
        obj = np.asarray(array, dtype=dtype, order=order).view(cls)                                 
        obj.metadata = kwargs
        return obj

    def __array_finalize__(self, obj):
        if obj is None: return
        self.metadata = getattr(obj, 'metadata', None)

Example usage:

>>> a = MetaArray([1,2,3], comment='/Documents/Data/foobar.txt')
>>> a.metadata
{'comment': '/Documents/Data/foobar.txt'}
like image 157
Bertrand L Avatar answered Sep 20 '22 10:09

Bertrand L


It sounds like you may be interested in storing metadata in a persistent way along with your array. If so, HDF5 is an excellent option to use as a storage container.

For example, let's create an array and save it to an HDF file with some metadata using h5py:

import numpy as np
import h5py

some_data = np.random.random((100, 100))

with h5py.File('data.hdf', 'w') as outfile:
    dataset = outfile.create_dataset('my data', data=some_data)

    dataset.attrs['an arbitrary key'] = 'arbitrary values'
    dataset.attrs['foo'] = 10.2

We can then read it back in:

import h5py

with h5py.File('data.hdf', 'r') as infile:
    dataset = infile['my data']
    some_data = dataset[...] # Load it into memory. Could also slice a subset.

    print dataset.attrs['an arbitrary key']
    print dataset.attrs['foo']

As others have mentioned, if you are only concerned with storing the data + metadata in memory, a better option is a dict or simple wrapper class. For example:

class Container:
    def __init__(self, data, **kwargs):
        self.data = data
        self.metadata = kwargs

Of course, this won't behave like a numpy array directly, but it's usually a bad idea to subclass ndarrays. (You can, but it's easy to do incorrectly. You're almost always better off designing a class that stores the array as an attribute.)

Better yet, make any operations you're doing methods of a similar class to the example above. For example:

import scipy.signal
import numpy as np

class SeismicCube(object):
    def __init__(self, data, bounds, metadata=None):
        self.data = data
        self.x0, self.x1, self.y0, self.y1, self.z0, self.z1= bounds
        self.bounds = bounds
        self.metadata = {} if metadata is None else metadata

    def inside(self, x, y, z):
        """Test if a point is inside the cube."""
        inx = self.x0 >= x >= self.x1
        iny = self.y0 >= y >= self.y1
        inz = self.z0 >= z >= self.z1
        return inx and iny and inz

    def inst_amp(self):
        """Calculate instantaneous amplitude and return a new SeismicCube."""
        hilb = scipy.signal.hilbert(self.data, axis=2)
        data = np.hypot(hilb.real, hilb.imag)
        return type(self)(data, self.bounds, self.metadata)
like image 22
Joe Kington Avatar answered Sep 19 '22 10:09

Joe Kington