There seem to be a lot of possibilities to pivot flat table data into a 3d array but I'm somehow not finding one that works: Suppose I have some data with columns=['name', 'type', 'date', 'value']. When I try to pivot via
pivot(index='name', columns=['type', 'date'], values='value')
I get
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)
Am I reading docs from dev pandas maybe? It seems like this is the usage described there. I am running 0.8 pandas.
I guess, I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel? I can use groupby and get the job done, but then this is almost like what I would do in numpy to assemble an n-d array. Seems like a fairly generic operation so I would imagine it might be implemented already.
pivot only supports using a single column to generate your columns. You probably want to use pivot_table to generate a pivot table using multiple columns e.g.
pandas.tools.pivot.pivot_table(your_dataframe, values='value', index='name', columns=['type', 'date'], aggfunc='sum')
The hierarchical columns that are mentioned in the API reference and documentation for pivot relates to cases where you have multiple value fields rather than multiple categories.
Assuming 'type' and 'date' are categories, whose values should be used as the column names, then you should use pivot_table.
However, if you want separate columns for different value fields for the same category (e.g. 'type'), then you should use pivot without specifying the value column and your category as the columns parameter.
For example, suppose you have this DataFrame:
df = DataFrame({'name': ['A', 'B', 'A', 'B'], 'type': [1, 1, 2, 2], 'date': ['2012-01-01', '2012-01-01', '2012-02-01', '2012-02-01'], 'value': [1, 2, 3, 4]})
pt = df.pivot_table(values='value', index='name', columns=['type', 'date'])
p = df.pivot('name', 'type')
pt will be:
type 1 2
date 2012-01-01 2012-02-01
name
A 1 3
B 2 4
and p will be:
date value
type 1 2 1 2
name
A 2012-01-01 2012-02-01 1 3
B 2012-01-01 2012-02-01 2 4
NOTE: For pandas version < 0.14.0, the index and columns keyword arguments should be replaced with rows and cols respectively.
The original post ended with the question:
"I'm wondering if I have a MultiIndex ['x', 'y', 'z'] Series, is there a pandas way to put that in a panel?"
to which I was looking for a solution myself.
I ended up with the following:
In [1]: import pandas as pd
## generate xyz example:
In [3]: df = pd.DataFrame({col:pd.np.random.randint(0,10,10)
for col in ['x','y','z','data']})
## set all x,y,z coordinates as indices
In [5]: df.set_index(['x','y','z'], inplace=True)
## set the z coordinate as headers of the columns
# NB: this is will turn the data into "dense" with NaNs where there were no 'data'
In [7]: df = df['data'].unstack()
## now it is ready to be "pivot"ed into a panel
In [9]: data_panel = df.to_panel()
In [10]: df
Out[10]:
data
z 1 3 4 5 6 7 9
x y
1 5 NaN NaN NaN NaN NaN NaN 1
6 NaN NaN NaN NaN NaN NaN 0
2 9 NaN NaN NaN NaN NaN 1 NaN
3 9 6 NaN NaN NaN NaN NaN NaN
5 9 NaN NaN NaN NaN NaN NaN 8
7 1 NaN NaN NaN NaN 8 NaN NaN
3 NaN NaN NaN NaN NaN NaN 5
7 NaN NaN NaN 1 NaN NaN NaN
9 NaN 0 NaN NaN NaN NaN NaN
9 5 NaN NaN 1 NaN NaN NaN NaN
[10 rows x 7 columns]
In [11]: data_panel
Out[11]:
<class 'pandas.core.panel.Panel'>
Dimensions: 7 (items) x 6 (major_axis) x 6 (minor_axis)
Items axis: 1 to 9
Major_axis axis: 1 to 9
Minor_axis axis: 1 to 9
The columns headers will be the Items of the Panel, the first level index with be the MajorAxis (rows) and the second level will be the MinorAxis (columns)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With