Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a pandas MultiIndex DataFrame into a 3D array

Suppose I have a MultiIndex DataFrame:

                                c       o       l       u
major       timestamp                       
ONE         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

TWO         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.

should create an array:

array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])

One used to be able to do this with pd.Panel:

panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
... 

How would I be able to most effectively accomplish this with a multi index dataframe? Thanks

like image 794
James Avatar asked Feb 10 '19 11:02

James


People also ask

How do you convert a DataFrame to a 3D array in Python?

Pandas dataframe can be converted to numpy using method dataframe. to_numpy() but this won't be sufficient. To covert dataframe to 3D Numpy array we have to use addition method reshape() with dataframe. to_numpy() method.

How do I convert a pandas DataFrame to an array?

To convert Pandas DataFrame to Numpy Array, use the function DataFrame. to_numpy() . to_numpy() is applied on this DataFrame and the method returns object of type Numpy ndarray. Usually the returned ndarray is 2-dimensional.

How do I convert MultiIndex to single index in pandas?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index(). Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True.


Video Answer


3 Answers

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).


Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [24]: %timeit df.to_xarray().to_array()
3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [25]: 3180/3.31
Out[25]: 960.7250755287009
like image 176
unutbu Avatar answered Oct 24 '22 05:10

unutbu


How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

like image 33
Josh Friedlander Avatar answered Oct 24 '22 06:10

Josh Friedlander


In case you have different length for minor axis, you may try this:

df.unstack().ffill().bfill().stack().values.reshape(*df.index.levshape,-1)

still seems awkward through, why Panel was deprecated anyway?

like image 42
aEgoist Avatar answered Oct 24 '22 04:10

aEgoist