Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Constructing 3D Pandas DataFrame

Tags:

python

pandas

I'm having difficulty constructing a 3D DataFrame in Pandas. I want something like this

A               B               C start    end    start    end    start    end ... 7        20     42       52     90       101 11       21                     213      34 56       74                     9        45 45       12 

Where A, B, etc are the top-level descriptors and start and end are subdescriptors. The numbers that follow are in pairs and there aren't the same number of pairs for A, B etc. Observe that A has four such pairs, B has only 1, and C has 3.

I'm not sure how to proceed in constructing this DataFrame. Modifying this example didn't give me the designed output:

import numpy as np import pandas as pd  A = np.array(['one', 'one', 'two', 'two', 'three', 'three']) B = np.array(['start', 'end']*3) C = [np.random.randint(10, 99, 6)]*6 df = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C']) df.set_index(['A', 'B'], inplace=True) df 

yielded:

                C  A          B     one        start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]  two        start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]  three      start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54] 

Is there any way of breaking up the lists in C into their own columns?

EDIT: The structure of my C is important. It looks like the following:

 C = [[7,11,56,45], [20,21,74,12], [42], [52], [90,213,9], [101, 34, 45]] 

And the desired output is the one at the top. It represents the starting and ending points of subsequences within a certain sequence (A, B. C are the different sequences). Depending on the sequence itself, there are a differing number of subsequences that satisfy a given condition I'm looking for. As a result, there are a differing number of start:end pairs for A, B, etc

like image 569
tlnagy Avatar asked Jun 18 '14 16:06

tlnagy


People also ask

Can DataFrame be 3D?

Introduction to Pandas 3D DataFrame. Pandas 3D dataframe representation has consistently been a difficult errand yet with the appearance of dataframe plot() work it is very simple to make fair-looking plots with your dataframe. 3D plotting in Matplotlib begins by empowering the utility toolbox.

Which data structure of the pandas work with 3D data?

The pandas Panel A Panel is a 3D array. It is not as widely used as Series or DataFrames. It is not as easily displayed on screen or visualized as the other two because of its 3D nature. It is generally used for 3D time-series data.

How many dimensions can DataFrame have?

DataFrame. DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

Is pandas DataFrame multidimensional?

Pandas DataFrame is a two-dimensional size-mutable, potentially composite tabular data structure with labeled axes (rows and columns). DataFrame can contain the following data type of data. The Pandas Series: a one-dimensional labeled array capable of holding any data type with axis labels or indexes.


1 Answers

First, I think you need to fill C to represent missing values

In [341]: max_len = max(len(sublist) for sublist in C) In [344]: for sublist in C:      ...:     sublist.extend([np.nan] * (max_len - len(sublist)))  In [345]: C Out[345]:  [[7, 11, 56, 45],  [20, 21, 74, 12],  [42, nan, nan, nan],  [52, nan, nan, nan],  [90, 213, 9, nan],  [101, 34, 45, nan]] 

Then, convert to a numpy array, transpose, and pass to the DataFrame constructor along with the columns.

In [288]: C = np.array(C) In [289]: df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))  In [349]: df Out[349]:       one         two       three         start  end  start  end  start  end 0      7   20     42   52     90  101 1     11   21    NaN  NaN    213   34 2     56   74    NaN  NaN      9   45 3     45   12    NaN  NaN    NaN  NaN 
like image 140
chrisb Avatar answered Oct 05 '22 20:10

chrisb