I have some data like this:
import pandas as pd
df = pd.DataFrame(index = range(1,13), columns=['school', 'year', 'metric', 'values'], )
df['school'] = ['id1']*6 + ['id2']*6
df['year'] = (['2015']*3 + ['2016']*3)*2
df['metric'] = ['tuition', 'admitsize', 'avgfinaid'] * 4
df['values'] = range(1,13)
df
school year metric values
1 id1 2015 tuition 1
2 id1 2015 admitsize 2
3 id1 2015 avgfinaid 3
4 id1 2016 tuition 4
5 id1 2016 admitsize 5
6 id1 2016 avgfinaid 6
7 id2 2015 tuition 7
8 id2 2015 admitsize 8
9 id2 2015 avgfinaid 9
10 id2 2016 tuition 10
11 id2 2016 admitsize 11
12 id2 2016 avgfinaid 12
I would like to pivot the metric & values columns to wide format. That is, I want:
school year tuition admitsize avgfinaid
id1 2015 1 2 3
id1 2016 4 5 6
id2 2015 7 8 9
id2 2016 10 11 12
if this were R, I would do something like:
df2 <- dcast(df, id + year ~ metric, value.var = "values")
How do I do this in pandas? I have read this (otherwise very helpful) SO answer and this (also otherwise excellent) example in the pandas docs, but did not grok how to apply it to my needs. I do not need a one-liner like dcast, just an example of how to get the result in a standard DataFrame (not a groupby, multi-index, or other fancy object).
Python and R are the two key players in the data science ecosystem. Both of these programming languages offer a rich selection of highly useful libraries. When it comes to data analysis and manipulation, two libraries stand out: “data.table” for R and Pandas for Python. I have been using both but I cannot really declare one superior to the other.
I have read this (otherwise very helpful) SO answer and this (also otherwise excellent) example in the pandas docs, but did not grok how to apply it to my needs. I do not need a one-liner like dcast, just an example of how to get the result in a standard DataFrame (not a groupby, multi-index, or other fancy object).
An expression using a data.frame called df in R with the columns a and b would be evaluated using with like so: In pandas the equivalent expression, using the eval () method, would be: In certain cases eval () will be much faster than evaluation in pure Python. For more details and examples see the eval documentation.
Ease-of-use: Is one tool easier/harder to use (you may have to be the judge of this, given side-by-side code comparisons) This page is also here to offer a bit of a translation guide for users of these R packages. For transfer of DataFrame objects from pandas to R, one option is to use HDF5 files, see External compatibility for an example.
you can use pivot_table():
In [23]: df2 = (df.pivot_table(index=['school', 'year'], columns='metric',
....: values='values')
....: .reset_index()
....: )
In [24]:
In [24]: df2
Out[24]:
metric school year admitsize avgfinaid tuition
0 id1 2015 2 3 1
1 id1 2016 5 6 4
2 id2 2015 8 9 7
3 id2 2016 11 12 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With