Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Long/wide data to wide/long

I have a data frame that look as follow:

import pandas as pd
d = {'decil': ['1. decil','1. decil','2. decil','2. decil','3. decil','3. decil'],
    'kommune': ['AA','BB','AA','BB','AA','BB'],'2010':[44,25,242,423,845,962],
    '2011':[64,26,239,620,862,862]}    
df = pd.DataFrame(data=d)

Printing

decil      kommune  2010  2011
1. decil   AA       44    64
1. decil   BB       25    26
2. decil   AA      242   239
2. decil   BB      423   620
3. decil   AA      845   862
3. decil   BB      962   862

My desired output is something like this

 kommune  year  1. decil  2. decil  3. decil
 AA       2010        44       242       845
 AA       2011        64       239       862
 BB       2010        25       423       962
 BB       2011        25       620       862

That is, I'm searching for a way to change the 'decil' column from long to wide format while at the same time changing the year columns from wide to long format. I have tried pd.pivot_table, loops and unstack without any luck. Is there any smart way around this? In advance, thanks for the help.

like image 655
A.joh Avatar asked Sep 27 '18 10:09

A.joh


People also ask

What is the difference between wide data and long data?

A dataset can be written in two different formats: wide and long. A wide format contains values that do not repeat in the first column. A long format contains values that do repeat in the first column. Notice that in the wide dataset, each value in the first column is unique.

How do you change data from long to wide in Excel?

You want to reshape it to wide format. Press CTRL + SHIFT + ENTER to confirm this formula as it's an array formula. If this formula is entered correctly, you would see the formula inside the curly brackets {}. Column A does not necessarily to be in numeric format.

How do I convert long data to wide data in R?

To convert long data back into a wide format, we can use the cast function. There are many cast functions, but we will use the dcast function because it is used for data frames.


1 Answers

Use set_index with stack and unstack:

df = (df.set_index(['decil','kommune'])
        .stack()
        .unstack(0)
        .reset_index()
        .rename_axis(None, axis=1))

print (df)
  kommune level_1  1. decil  2. decil  3. decil
0      AA    2010        44       242       845
1      AA    2011        64       239       862
2      BB    2010        25       423       962
3      BB    2011        26       620       862
like image 171
jezrael Avatar answered Sep 28 '22 01:09

jezrael