Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas pivot dataframe to 3d data

Tags:

python

pandas

There seem to be a lot of possibilities to pivot flat table data into a 3d array but I'm somehow not finding one that works: Suppose I have some data with columns=['name', 'type', 'date', 'value']. When I try to pivot via

pivot(index='name', columns=['type', 'date'], values='value')

I get

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Am I reading docs from dev pandas maybe? It seems like this is the usage described there. I am running 0.8 pandas.

I guess, I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel? I can use groupby and get the job done, but then this is almost like what I would do in numpy to assemble an n-d array. Seems like a fairly generic operation so I would imagine it might be implemented already.

like image 261
mathtick Avatar asked Jan 27 '26 15:01

mathtick


2 Answers

pivot only supports using a single column to generate your columns. You probably want to use pivot_table to generate a pivot table using multiple columns e.g.

pandas.tools.pivot.pivot_table(your_dataframe, values='value', index='name', columns=['type', 'date'], aggfunc='sum')

The hierarchical columns that are mentioned in the API reference and documentation for pivot relates to cases where you have multiple value fields rather than multiple categories.

Assuming 'type' and 'date' are categories, whose values should be used as the column names, then you should use pivot_table.

However, if you want separate columns for different value fields for the same category (e.g. 'type'), then you should use pivot without specifying the value column and your category as the columns parameter.

For example, suppose you have this DataFrame:

df = DataFrame({'name': ['A', 'B', 'A', 'B'], 'type': [1, 1, 2, 2], 'date': ['2012-01-01', '2012-01-01', '2012-02-01', '2012-02-01'],  'value': [1, 2, 3, 4]})

pt = df.pivot_table(values='value', index='name', columns=['type', 'date'])
p = df.pivot('name', 'type')

pt will be:

type           1           2
date  2012-01-01  2012-02-01
name                        
A              1           3
B              2           4

and p will be:

          date              value   
type           1           2      1  2
name                                  
A     2012-01-01  2012-02-01      1  3
B     2012-01-01  2012-02-01      2  4

NOTE: For pandas version < 0.14.0, the index and columns keyword arguments should be replaced with rows and cols respectively.

like image 185
Matti John Avatar answered Jan 30 '26 04:01

Matti John


The original post ended with the question:

"I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel?"

to which I was looking for a solution myself.

I ended up with the following:

In [1]: import pandas as pd

## generate xyz example:
In [3]: df = pd.DataFrame({col:pd.np.random.randint(0,10,10) 
                               for col in ['x','y','z','data']})

## set all x,y,z coordinates as indices
In [5]: df.set_index(['x','y','z'], inplace=True)

## set the z coordinate as headers of the columns 
# NB: this is will turn the data into "dense" with NaNs where there were no 'data'
In [7]: df = df['data'].unstack()

## now it is ready to be "pivot"ed into a panel
In [9]: data_panel = df.to_panel()

In [10]: df
Out[10]: 
     data                        
z       1   3   4   5   6   7   9
x y                              
1 5   NaN NaN NaN NaN NaN NaN   1
  6   NaN NaN NaN NaN NaN NaN   0
2 9   NaN NaN NaN NaN NaN   1 NaN
3 9     6 NaN NaN NaN NaN NaN NaN
5 9   NaN NaN NaN NaN NaN NaN   8
7 1   NaN NaN NaN NaN   8 NaN NaN
  3   NaN NaN NaN NaN NaN NaN   5
  7   NaN NaN NaN   1 NaN NaN NaN
  9   NaN   0 NaN NaN NaN NaN NaN
9 5   NaN NaN   1 NaN NaN NaN NaN

[10 rows x 7 columns]

In [11]: data_panel
Out[11]: 
<class 'pandas.core.panel.Panel'>
Dimensions: 7 (items) x 6 (major_axis) x 6 (minor_axis)
Items axis: 1 to 9
Major_axis axis: 1 to 9
Minor_axis axis: 1 to 9

The columns headers will be the Items of the Panel, the first level index with be the MajorAxis (rows) and the second level will be the MinorAxis (columns)

like image 43
eldad-a Avatar answered Jan 30 '26 05:01

eldad-a



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!