How to convert a pandas MultiIndex DataFrame into a 3D array

Tags:

Suppose I have a MultiIndex DataFrame:

                                c       o       l       u
major       timestamp                       
ONE         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

TWO         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.

should create an array:

array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])

One used to be able to do this with pd.Panel:

panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
...

How would I be able to most effectively accomplish this with a multi index dataframe? Thanks

794

asked Feb 10 '19 11:02

James

Video Answer

3 Answers

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).

Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [24]: %timeit df.to_xarray().to_array()
3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [25]: 3180/3.31
Out[25]: 960.7250755287009

176

answered Oct 24 '22 05:10

unutbu

How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

answered Oct 24 '22 06:10

Josh Friedlander

In case you have different length for minor axis, you may try this:

df.unstack().ffill().bfill().stack().values.reshape(*df.index.levshape,-1)

still seems awkward through, why Panel was deprecated anyway?

answered Oct 24 '22 04:10

aEgoist

Related questions
                            
                                Python Treeview scrollbar
                            
                                invalid parameter server_name in /etc/nginx/sites-enabled/django
                            
                                Pandas - find index of value anywhere in DataFrame
                            
                                Python: use cookie to login with Selenium
                            
                                Pyspark Save dataframe to S3
                            
                                Exclude first row when importing data from excel into Python
                            
                                Irregular shape detection and measurement in python opencv
                            
                                Deleting diagonal elements of a numpy array
                            
                                How to import and read a shelve or Numpy file in Google Colaboratory?
                            
                                Drop rows by index from dataframe
                            
                                GAE ERROR :- /bin/sh: 1: exec: gunicorn: not found
                            
                                How to correctly check if a camera is available?
                            
                                How to get the value of an element in Python + Selenium?
                            
                                Convert 2-tuple key in dictionary and add to a new dictionary as a dictionary value
                            
                                Level NaN must be same as name
                            
                                How to solve UnicodeDecodeError in Python 3.6?
                            
                                Selenium - python. how to capture network traffic's response [duplicate]
                            
                                Pandas dataframe: Remove secondary upcoming same value
                            
                                Replacing Rows in Pandas DataFrame with Other DataFrame Based on Index
                            
                                df.loc more than 2 conditions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert a pandas MultiIndex DataFrame into a 3D array

Tags:

python

arrays

pandas

numpy