Constructing 3D Pandas DataFrame

Tags:

pandas

I'm having difficulty constructing a 3D DataFrame in Pandas. I want something like this

A               B               C start    end    start    end    start    end ... 7        20     42       52     90       101 11       21                     213      34 56       74                     9        45 45       12

Where A, B, etc are the top-level descriptors and start and end are subdescriptors. The numbers that follow are in pairs and there aren't the same number of pairs for A, B etc. Observe that A has four such pairs, B has only 1, and C has 3.

I'm not sure how to proceed in constructing this DataFrame. Modifying this example didn't give me the designed output:

import numpy as np import pandas as pd  A = np.array(['one', 'one', 'two', 'two', 'three', 'three']) B = np.array(['start', 'end']*3) C = [np.random.randint(10, 99, 6)]*6 df = pd.DataFrame(zip(A, B, C), columns=['A', 'B', 'C']) df.set_index(['A', 'B'], inplace=True) df

yielded:

                C  A          B     one        start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]  two        start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]  three      start   [22, 19, 16, 20, 63, 54]               end   [22, 19, 16, 20, 63, 54]

Is there any way of breaking up the lists in C into their own columns?

EDIT: The structure of my C is important. It looks like the following:

 C = [[7,11,56,45], [20,21,74,12], [42], [52], [90,213,9], [101, 34, 45]]

And the desired output is the one at the top. It represents the starting and ending points of subsequences within a certain sequence (A, B. C are the different sequences). Depending on the sequence itself, there are a differing number of subsequences that satisfy a given condition I'm looking for. As a result, there are a differing number of start:end pairs for A, B, etc

569

asked Jun 18 '14 16:06

tlnagy

1 Answers

First, I think you need to fill C to represent missing values

In [341]: max_len = max(len(sublist) for sublist in C) In [344]: for sublist in C:      ...:     sublist.extend([np.nan] * (max_len - len(sublist)))  In [345]: C Out[345]:  [[7, 11, 56, 45],  [20, 21, 74, 12],  [42, nan, nan, nan],  [52, nan, nan, nan],  [90, 213, 9, nan],  [101, 34, 45, nan]]

Then, convert to a numpy array, transpose, and pass to the DataFrame constructor along with the columns.

In [288]: C = np.array(C) In [289]: df = pd.DataFrame(data=C.T, columns=pd.MultiIndex.from_tuples(zip(A,B)))  In [349]: df Out[349]:       one         two       three         start  end  start  end  start  end 0      7   20     42   52     90  101 1     11   21    NaN  NaN    213   34 2     56   74    NaN  NaN      9   45 3     45   12    NaN  NaN    NaN  NaN

140

answered Oct 05 '22 20:10

chrisb

Related questions
                            
                                What's the recommended scoped_session usage pattern in a multithreaded sqlalchemy webapp?
                            
                                Updated environment variable but os.getenv() keeps returning None [closed]
                            
                                Pickle all attributes except one
                            
                                sqlalchemy: get max/min/avg values from a table
                            
                                What OCR options exist beyond Tesseract? [closed]
                            
                                Python naming conventions in decorators
                            
                                Is unsetting a single bit in flags safe with Python variable-length integers?
                            
                                Not able to install packages in Pycharm
                            
                                What's the difference between super() and Parent class name?
                            
                                How does Django handle multiple requests?
                            
                                Atomic file write operations (cross platform)
                            
                                What is os.linesep for?
                            
                                Substitutions inside Sphinx code blocks aren't replaced
                            
                                How do I validate a JSON Schema schema, in Python?
                            
                                How to include third party Python packages in Sublime Text 2 plugins
                            
                                Ipython console in Spyder stuck on "connecting to kernel"
                            
                                Share sqlalchemy models between flask and other apps
                            
                                Safe dereferencing in Python
                            
                                Redis: How to parse a list result
                            
                                TypeError: can only concatenate list (not "str") to list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With