Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Axes labels within numpy arrays

Tags:

python

numpy

Does np.ndarray have the functionality of carrying axes labels?

Let's say that I have a 2-D array with dimensions being time and speed. I want to actually have both axes labels (time and speed values) embedded in an object, so that the object takes care of the axes whenever I do operations with the array (e.g. slice or even plot).

After searching for a while I haven't found anything. I was about to start coding such a class myself and then I thought about asking here, just in case I've missed something.

Thanks

EDIT

Given the comments and answers so far, I think I haven't explained my self clear, or just the reasoning behind what I want is not clear due to an oversimplified example [time, speed].

In the field I work it's common to have recordings from multiple sensors, and then segment the data so that you have multiple samples/events. If each sensor captures a 1-dimensional signal across time, one have dimensions [Sensor, Event, Time] (dimension is implicit in the data itself).

When using pure numpy.ndarray, you'll end up with variables: data, a 3-D array with the recorded data; sensor, a 1-D np.recarray with all the information for each sensor (e.g. name, location, ...); event, a 1-D np.recarray with all the information for each sample/event (e.g. type, offset, ...); and Time, a vector with the time values.

What I want is to have all that information in a single object mydata and don't worry about basic manipulations (slicing). So that mydata[0:3, 1:10] will slice the corresponding dimensions accordingly.

I agree that things like plotting will be data specific, but I'll happily code a subclass of such object with some extra functions (e.g. plot).

Why would this be useful?

Readability: Compare

data1 = data[0:3, 1:10]
sensor1 = sensor[0:3]
event1 = event[1:10]
time1 = time

with a simple

mydata1 = mydata[0:3, 1:10]

Maintenance: The second option is obviously easier to maintain and less prone to errors in the correct slicing of all associated variable.

Convenience: Having all this information in the same place allows to integrate useful and powerful function within the class. For example, if I create a derived class for time series (forcing to have a time axis), I can run time specific functions without having to specify time or sampling frequency (as this information is within the object itself). The idea is to have a base class carrying axes' labels, and specific subclasses will naturally arise when necessary (e.g. one for time series, one for video, one for topographic information, etc) incorporating specialized functionality.

Close but not exactly

As @user2357112 mentioned, Pandas' DataFrame is close to what I'm looking for. But, apart from the fact that N-D arrays is still experimental, it seems to be too much oriented to a table-like behaviour (for what I've read so far), e.g. treating the first dimension differently than the others (items vs columns).

Is it worth it?

The above may seem trivial, and not worth the effort, but I programmed a subclass of np.ndarray with such functionality a few years ago and I can assure you it made my life and code so much easier! (The specific application was similar to the example above [sensor, sample, time]). But that was back when I was learning python and the way I coded it isn't what you'll call pretty. It also has some fundamental faults, like the axes labels not following the same share-memory rules as np.ndarray.

Before embarking in the trouble of rewrite this thing and make it public, I wanted to know if there's something similar out there.

like image 768
Marcos Avatar asked Feb 18 '16 17:02

Marcos


1 Answers

Here are the two options I could see working:

  1. Some insane 5D Numpy array with loads of repeating elements and sub elements. (which will absolutely not help at all for one of the primary problems you are trying to solve, namely ease of indexing.)
  2. Or Multi-Indexing and Pivot-Tables from Pandas

Pandas Docs - Intro documentation for Pandas generally if you're unused to the library. This link will take you to the hierarchical-indexing / pivot table section

Pandas Docs - Multi/Hierarchical Indexing

Pandas Docs - Pivot Tables, Stacking and Unstacking

An article of a worked example with LoTR script data

Without some example data to know exactly what you're working with I could only copy paste some example code from these links.

True, it is Pandas. Which you said wouldn't meet your needs. But the thing you pushed back against was the limitation of only having labels on one axis. Hierarchical indexing is Pandas answer to this problem and Pivot tables give you easy (if initially obscure) methods for reshaping your data into the arrangements you need for a given purpose.

Also the accessing of elements and subgroups of data is incredibly easy, which was one of your main requirements.

From what I am aware of it also maintains the high mathematical performance typical of Pandas.

like image 100
Gheko Avatar answered Sep 20 '22 22:09

Gheko