I'm having difficulty constructing a 3D DataFrame in Pandas. I want something like this
A               B               C start    end    start    end    start    end ... 7        20     42       52     90       101 11       21                     213      34 56       74                     9        45 45       12   Where A, B, etc are the top-level descriptors and start and end are subdescriptors. The numbers that follow are in pairs and there aren't the same number of pairs for A, B etc. Observe that A has four such pairs, B has only 1, and C has 3.
I'm not sure how to proceed in constructing this DataFrame. Modifying this example didn't give me the designed output:
import numpy as np import pandas as pd  A = np.array(['one', 'one', 'two', 'two', 'three', 'three']) B = np.array(['start', 'end']*3) C = [np.random.randint(10, 99, 6)]*6 df = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C']) df.set_index(['A', 'B'], inplace=True) df   yielded:
                C  A          B     one        start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]  two        start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]  three      start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]   Is there any way of breaking up the lists in C into their own columns?
EDIT: The structure of my C is important. It looks like the following:
 C = [[7,11,56,45], [20,21,74,12], [42], [52], [90,213,9], [101, 34, 45]]   And the desired output is the one at the top. It represents the starting and ending points of subsequences within a certain sequence (A, B. C are the different sequences). Depending on the sequence itself, there are a differing number of subsequences that satisfy a given condition I'm looking for. As a result, there are a differing number of start:end pairs for A, B, etc
Introduction to Pandas 3D DataFrame. Pandas 3D dataframe representation has consistently been a difficult errand yet with the appearance of dataframe plot() work it is very simple to make fair-looking plots with your dataframe. 3D plotting in Matplotlib begins by empowering the utility toolbox.
The pandas Panel A Panel is a 3D array. It is not as widely used as Series or DataFrames. It is not as easily displayed on screen or visualized as the other two because of its 3D nature. It is generally used for 3D time-series data.
DataFrame. DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
Pandas DataFrame is a two-dimensional size-mutable, potentially composite tabular data structure with labeled axes (rows and columns). DataFrame can contain the following data type of data. The Pandas Series: a one-dimensional labeled array capable of holding any data type with axis labels or indexes.
First, I think you need to fill C to represent missing values
In [341]: max_len = max(len(sublist) for sublist in C) In [344]: for sublist in C:      ...:     sublist.extend([np.nan] * (max_len - len(sublist)))  In [345]: C Out[345]:  [[7, 11, 56, 45],  [20, 21, 74, 12],  [42, nan, nan, nan],  [52, nan, nan, nan],  [90, 213, 9, nan],  [101, 34, 45, nan]]   Then, convert to a numpy array, transpose, and pass to the DataFrame constructor along with the columns.
In [288]: C = np.array(C) In [289]: df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))  In [349]: df Out[349]:       one         two       three         start  end  start  end  start  end 0      7   20     42   52     90  101 1     11   21    NaN  NaN    213   34 2     56   74    NaN  NaN      9   45 3     45   12    NaN  NaN    NaN  NaN 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With