pandas equivalent for R dcast

Tags:

I have some data like this:

import pandas as pd
df = pd.DataFrame(index = range(1,13), columns=['school', 'year', 'metric', 'values'], )
df['school'] = ['id1']*6 + ['id2']*6
df['year'] = (['2015']*3 + ['2016']*3)*2
df['metric'] = ['tuition', 'admitsize', 'avgfinaid'] * 4
df['values'] = range(1,13)
df
   school  year     metric  values
1     id1  2015    tuition       1
2     id1  2015  admitsize       2
3     id1  2015  avgfinaid       3
4     id1  2016    tuition       4
5     id1  2016  admitsize       5
6     id1  2016  avgfinaid       6
7     id2  2015    tuition       7
8     id2  2015  admitsize       8
9     id2  2015  avgfinaid       9
10    id2  2016    tuition      10
11    id2  2016  admitsize      11
12    id2  2016  avgfinaid      12

I would like to pivot the metric & values columns to wide format. That is, I want:

school  year  tuition  admitsize  avgfinaid
   id1  2015        1          2          3
   id1  2016        4          5          6
   id2  2015        7          8          9
   id2  2016       10         11         12

if this were R, I would do something like:

df2 <- dcast(df, id + year ~ metric, value.var = "values")

How do I do this in pandas? I have read this (otherwise very helpful) SO answer and this (also otherwise excellent) example in the pandas docs, but did not grok how to apply it to my needs. I do not need a one-liner like dcast, just an example of how to get the result in a standard DataFrame (not a groupby, multi-index, or other fancy object).

994

asked May 01 '16 18:05

Don

1 Answers

you can use pivot_table():

In [23]: df2 = (df.pivot_table(index=['school', 'year'], columns='metric',
   ....:                       values='values')
   ....:          .reset_index()
   ....:       )

In [24]:

In [24]: df2
Out[24]:
metric school  year  admitsize  avgfinaid  tuition
0         id1  2015          2          3        1
1         id1  2016          5          6        4
2         id2  2015          8          9        7
3         id2  2016         11         12       10

148

answered Oct 18 '22 11:10

MaxU - stop WAR against UA

Related questions
                            
                                Django serialize multiple objects in one call
                            
                                Remove row with all NaN from DataFrame in pandas
                            
                                Splitting a List inside a Pandas DataFrame
                            
                                Django UpdateView / ImageField issue: not returning new uploaded image
                            
                                Import javascript files with jinja from static folder [duplicate]
                            
                                Move radial tick labels on a polar plot in matplotlib
                            
                                Unsupported lookup 'istartwith' for CharField or join on the field not permitted
                            
                                Python Input Sanitization
                            
                                Python import error :No module named Fabric.api?
                            
                                pandas v0.17.0: AttributeError: 'unicode' object has no attribute 'version'
                            
                                Python popen() - communicate( str.encode(encoding="utf-8", errors="ignore") ) crashes
                            
                                Calculating Precision, Recall and F-score in one pass - python
                            
                                How do I split Flask models out of app.py without passing db object all over?
                            
                                Compile Brotli into a DLL .NET can reference
                            
                                pandas qcut not putting equal number of observations into each bin
                            
                                Is there any method to get the number of rows and columns present in .xlsx sheet using openpyxl?
                            
                                Short way to serialize datetime with marshmallow
                            
                                GitPython create and push tags
                            
                                How to get value by multi-index with python pandas?
                            
                                Density map (heatmaps) in matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas equivalent for R dcast

Tags:

python

pandas

dataframe

r

pivot-table

Don

People also ask

1 Answers

MaxU - stop WAR against UA

Recent Activity

Donate For Us