Wide to long data transform in pandas

Tags:

python

pandas

I have a dataset in the following format:

county   area    pop_2006    pop_2007    pop_2008
01001    275      1037         1052        1102
01003    394      2399         2424        2438
01005    312      1638         1647        1660

And I need it in a format like this:

county    year   pop      area
01001     2006   1037      275
01001     2007   1052      275
01001     2008   1102      275
01003     2006   2399      394
01003     2007   2424      394
...

I've tried every combination of pivot_table, stack, unstack, wide_to_long that I can think of, with no success yet. (clearly I'm mostly illiterate in Python/pandas, so please be gentle...).

810

asked May 24 '16 15:05

PatrickC

2 Answers

You can use melt for reshaping, then split column variable and drop and sort_values. I think you can cast column year to int by astype and last change order of columns by subset:

df1 = (pd.melt(df, id_vars=['county','area'], value_name='pop'))
df1[['tmp','year']] = df1.variable.str.split('_', expand=True)
df1 = df1.drop(['variable', 'tmp'],axis=1).sort_values(['county','year'])
df1['year'] = df1.year.astype(int)
df1 = df1[['county','year','pop','area']]
print (df1)
   county  year   pop  area
0    1001  2006  1037   275
3    1001  2007  1052   275
6    1001  2008  1102   275
1    1003  2006  2399   394
4    1003  2007  2424   394
7    1003  2008  2438   394
2    1005  2006  1638   312
5    1005  2007  1647   312
8    1005  2008  1660   312

print (df1.dtypes)
county    int64
year      int32
pop       int64
area      int64
dtype: object

Another solution with set_index, stack and reset_index:

df2 = df.set_index(['county','area']).stack().reset_index(name='pop')
df2[['tmp','year']] = df2.level_2.str.split('_', expand=True)
df2 = df2.drop(['level_2', 'tmp'],axis=1)
df2['year'] = df2.year.astype(int)
df2 = df2[['county','year','pop','area']]

print (df2)
   county  year   pop  area
0    1001  2006  1037   275
1    1001  2007  1052   275
2    1001  2008  1102   275
3    1003  2006  2399   394
4    1003  2007  2424   394
5    1003  2008  2438   394
6    1005  2006  1638   312
7    1005  2007  1647   312
8    1005  2008  1660   312

print (df2.dtypes)
county    int64
year      int32
pop       int64
area      int64
dtype: object

153

answered Oct 21 '22 22:10

jezrael

As the question title suggests, we can use pd.wide_to_long:

res = pd.wide_to_long(df, stubnames="pop", i=["county", "area"], j="year", sep="_")

to get

                   pop
county area year
1001   275  2006  1037
            2007  1052
            2008  1102
1003   394  2006  2399
            2007  2424
            2008  2438
1005   312  2006  1638
            2007  1647
            2008  1660

To exactly match the output format in the question, a reset_index and reindex (over columns) can be chained:

>>> res.reset_index().reindex(["county", "year", "pop", "area"], axis=1)

   county  year   pop  area
0    1001  2006  1037   275
1    1001  2007  1052   275
2    1001  2008  1102   275
3    1003  2006  2399   394
4    1003  2007  2424   394
5    1003  2008  2438   394
6    1005  2006  1638   312
7    1005  2007  1647   312
8    1005  2008  1660   312

answered Oct 21 '22 23:10

Mustafa Aydın

Related questions
                            
                                When is chr(ord(c)) not equal to c in Python?
                            
                                python speed processing per line VS in chunk
                            
                                Passing SOME of the parameters to a function in python
                            
                                How do I override the str function without raising a UnicodeEncodeError?
                            
                                How to use python to convert a float number to fixed point with predefined number of bits
                            
                                I think immutable types like frozenset and tuple not actually copied. What is this called? Does it have any implications?
                            
                                In flask how do i call data from another function/route in another view as explained below
                            
                                how can we use scipy.signal.resample to downsample the speech signal from 44100 to 8000 Hz signal?
                            
                                Parallelize a function call with mpi4py
                            
                                Why is pandas inserting spaces in my histogram?
                            
                                Django admin form, field instead of object in foreign key
                            
                                creating Mat with openCV in python
                            
                                python multiprocessing map mishandling of last processes
                            
                                ImportError: No module named appengine.api
                            
                                Pandas: AttributeError: 'DataFrame' object has no attribute 'agg'
                            
                                neomodel giving Attribute error on save
                            
                                Fastest way to compute distance beetween each points in python
                            
                                create multiple columns from 1 column pandas
                            
                                In Python, how to avoid calling __init__ twice in a class derived from a class with super() in its __new__:
                            
                                Using hyphen/dash in python repository name and package name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With