Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas equivalent for R dcast

I have some data like this:

import pandas as pd
df = pd.DataFrame(index = range(1,13), columns=['school', 'year', 'metric', 'values'], )
df['school'] = ['id1']*6 + ['id2']*6
df['year'] = (['2015']*3 + ['2016']*3)*2
df['metric'] = ['tuition', 'admitsize', 'avgfinaid'] * 4
df['values'] = range(1,13)
df
   school  year     metric  values
1     id1  2015    tuition       1
2     id1  2015  admitsize       2
3     id1  2015  avgfinaid       3
4     id1  2016    tuition       4
5     id1  2016  admitsize       5
6     id1  2016  avgfinaid       6
7     id2  2015    tuition       7
8     id2  2015  admitsize       8
9     id2  2015  avgfinaid       9
10    id2  2016    tuition      10
11    id2  2016  admitsize      11
12    id2  2016  avgfinaid      12

I would like to pivot the metric & values columns to wide format. That is, I want:

school  year  tuition  admitsize  avgfinaid
   id1  2015        1          2          3
   id1  2016        4          5          6
   id2  2015        7          8          9
   id2  2016       10         11         12

if this were R, I would do something like:

df2 <- dcast(df, id + year ~ metric, value.var = "values")

How do I do this in pandas? I have read this (otherwise very helpful) SO answer and this (also otherwise excellent) example in the pandas docs, but did not grok how to apply it to my needs. I do not need a one-liner like dcast, just an example of how to get the result in a standard DataFrame (not a groupby, multi-index, or other fancy object).

like image 994
Don Avatar asked May 01 '16 18:05

Don


People also ask

Is pandas for Python better than R for data analysis and manipulation?

Python and R are the two key players in the data science ecosystem. Both of these programming languages offer a rich selection of highly useful libraries. When it comes to data analysis and manipulation, two libraries stand out: “data.table” for R and Pandas for Python. I have been using both but I cannot really declare one superior to the other.

Do we need a one-liner like dcast in pandas?

I have read this (otherwise very helpful) SO answer and this (also otherwise excellent) example in the pandas docs, but did not grok how to apply it to my needs. I do not need a one-liner like dcast, just an example of how to get the result in a standard DataFrame (not a groupby, multi-index, or other fancy object).

How do I evaluate an expression in R using PANDAS?

An expression using a data.frame called df in R with the columns a and b would be evaluated using with like so: In pandas the equivalent expression, using the eval () method, would be: In certain cases eval () will be much faster than evaluation in pure Python. For more details and examples see the eval documentation.

What is the easiest way to transfer data from pandas to R?

Ease-of-use: Is one tool easier/harder to use (you may have to be the judge of this, given side-by-side code comparisons) This page is also here to offer a bit of a translation guide for users of these R packages. For transfer of DataFrame objects from pandas to R, one option is to use HDF5 files, see External compatibility for an example.


1 Answers

you can use pivot_table():

In [23]: df2 = (df.pivot_table(index=['school', 'year'], columns='metric',
   ....:                       values='values')
   ....:          .reset_index()
   ....:       )

In [24]:

In [24]: df2
Out[24]:
metric school  year  admitsize  avgfinaid  tuition
0         id1  2015          2          3        1
1         id1  2016          5          6        4
2         id2  2015          8          9        7
3         id2  2016         11         12       10
like image 148
MaxU - stop WAR against UA Avatar answered Oct 18 '22 11:10

MaxU - stop WAR against UA