Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

transposing a h5py dataset without using a numpy array

I need to transpose a h5py dataset, to access a 3D array as a stack of 2D images.

I want to be able to slice the 3D volume in any of the 3 possible orientations, while keeping the first dimension as the image index.

I don't want to cast my dataset into a numpy array, to avoid reading the entire dataset from disk if only some of the images need to be displayed.

like image 834
PiRK Avatar asked Jan 25 '26 16:01

PiRK


2 Answers

This is a solution using a proxy object that adds a layer on top of the dataset's __getitem__ method to take the transposition into account. It should work for any number of dimensions, but was tested extensively only in 3D.

Example:

my_3D_dataset_201_transposition = TransposedDatasetView(
        my_3D_dataset,
        transposition=(2, 0, 1))
assert my_3D_dataset[i, j, k] == my_3D_dataset_201_transposition[k, i, j]

My class is defined the following way:

class TransposedDatasetView(object):
    """
    This class provides a way to transpose a dataset without
    casting it into a numpy array. This way, the dataset in a file need not
    necessarily be integrally read into memory to view it in a different
    transposition.

    .. note::
        The performances depend a lot on the way the dataset was written
        to file. Depending on the chunking strategy, reading a complete 2D slice
        in an unfavorable direction may still require the entire dataset to
        be read from disk.

    :param dataset: h5py dataset
    :param transposition: List of dimension numbers in the wanted order
    """
    def __init__(self, dataset, transposition=None):
        """

        """
        super(TransposedDatasetView, self).__init__()
        self.dataset = dataset
        """original dataset"""

        self.shape = dataset.shape
        """Tuple of array dimensions"""
        self.dtype = dataset.dtype
        """Data-type of the array’s element"""
        self.ndim = len(dataset.shape)
        """Number of array dimensions"""

        size = 0
        if self.ndim:
            size = 1
            for dimsize in self.shape:
                size *= dimsize
        self.size = size
        """Number of elements in the array."""

        self.transposition = list(range(self.ndim))
        """List of dimension indices, in an order depending on the
        specified transposition. By default this is simply
        [0, ..., self.ndim], but it can be changed by specifying a different
        `transposition` parameter at initialization.

        Use :meth:`transpose`, to create a new :class:`TransposedDatasetView`
        with a different :attr:`transposition`.
        """

        if transposition is not None:
            assert len(transposition) == self.ndim
            assert set(transposition) == set(list(range(self.ndim))), \
                "Transposition must be a list containing all dimensions"
            self.transposition = transposition
            self.__sort_shape()

    def __sort_shape(self):
        """Sort shape in the order defined in :attr:`transposition`
        """
        new_shape = tuple(self.shape[dim] for dim in self.transposition)
        self.shape = new_shape

    def __sort_indices(self, indices):
        """Return array indices sorted in the order needed
        to access data in the original non-transposed dataset.

        :param indices: Tuple of ndim indices, in the order needed
            to access the view
        :return: Sorted tuple of indices, to access original data
        """
        assert len(indices) == self.ndim
        sorted_indices = tuple(idx for (_, idx) in
                               sorted(zip(self.transposition, indices)))
        return sorted_indices

    def __getitem__(self, item):
        """Handle fancy indexing with regards to the dimension order as
        specified in :attr:`transposition`

        The supported fancy-indexing syntax is explained at
        http://docs.h5py.org/en/latest/high/dataset.html#fancy-indexing.

        Additional restrictions exist if the data has been transposed:

            - numpy boolean array indexing is not supported
            - ellipsis objects are not supported

        :param item: Index, possibly fancy index (must be supported by h5py)
        :return:
        """
        # no transposition, let the original dataset handle indexing
        if self.transposition == list(range(self.ndim)):
            return self.dataset[item]

        # 1-D slicing -> n-D slicing (n=1)
        if not hasattr(item, "__len__"):
            # first dimension index is given
            item = [item]
            # following dimensions are indexed with : (all elements)
            item += [slice(0, sys.maxint, 1) for _i in range(self.ndim - 1)]

        # n-dimensional slicing
        if len(item) != self.ndim:
            raise IndexError(
                "N-dim slicing requires a tuple of N indices/slices. " +
                "Needed dimensions: %d" % self.ndim)

        # get list of indices sorted in the original dataset order
        sorted_indices = self.__sort_indices(item)

        output_data_not_transposed = self.dataset[sorted_indices]

        # now we must transpose the output data
        output_dimensions = []
        frozen_dimensions = []
        for i, idx in enumerate(item):
            # slices and sequences
            if not isinstance(idx, int):
                output_dimensions.append(self.transposition[i])
            # regular integer index
            else:
                # whenever a dimension is fixed (indexed by an integer)
                # the number of output dimension is reduced
                frozen_dimensions.append(self.transposition[i])

        # decrement output dimensions that are above frozen dimensions
        for frozen_dim in reversed(sorted(frozen_dimensions)):
            for i, out_dim in enumerate(output_dimensions):
                if out_dim > frozen_dim:
                    output_dimensions[i] -= 1

        assert (len(output_dimensions) + len(frozen_dimensions)) == self.ndim
        assert set(output_dimensions) == set(range(len(output_dimensions)))

        return numpy.transpose(output_data_not_transposed,
                               axes=output_dimensions)

    def __array__(self, dtype=None):
        """Cast the dataset into a numpy array, and return it.

        If a transposition has been done on this dataset, return
        a transposed view of a numpy array."""
        return numpy.transpose(numpy.array(self.dataset, dtype=dtype),
                               self.transposition)

    def transpose(self, transposition=None):
        """Return a re-ordered (dimensions permutated)
        :class:`TransposedDatasetView`.

        The returned object refers to
        the same dataset but with a different :attr:`transposition`.

        :param list[int] transposition: List of dimension numbers in the wanted order
        :return: Transposed TransposedDatasetView
        """
        # by default, reverse the dimensions
        if transposition is None:
            transposition = list(reversed(self.transposition))

        return TransposedDatasetView(self.dataset,
                                     transposition)
like image 132
PiRK Avatar answered Jan 28 '26 08:01

PiRK


Having a similar problem, and inspired largely by PiRK's response, my team is distributing a pip-installable package for this: https://pypi.org/project/lazy-ops/

like image 43
ben.dichter Avatar answered Jan 28 '26 07:01

ben.dichter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!