Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: dataframe to long format

Tags:

python

pandas

I have the following df:

tz.head()

    state   2004    2005    2006    2007    2008    2009    2010    2011    2012    2013    2014    2015
0   AL      5.7     4.5     4.0     4.0     5.7     11.0    10.5    9.6     8.0     7.2     6.8     6.1
1   AK      7.5     6.9     6.6     6.3     6.7     7.7     7.9     7.6     7.1     6.9     6.9     6.5
2   AZ      5.0     4.7     4.2     3.9     6.2     9.9     10.4    9.5     8.3     7.7     6.8     6.1
3   AR      5.7     5.2     5.2     5.3     5.5     7.8     8.2     8.3     7.6     7.3     6.1     5.2
4   CA      6.2     5.4     4.9     5.4     7.3     11.2    12.2    11.7    10.4    8.9     7.5     6.2

I would like to change it so that looks like this:

year    state   unemployment
2004    AL  5.7
2005    AL  4.5
2006    AL  4.0
2007    AL  4.0
2008    AL  5.7
2009    AL  11.0
2010    AL  10.5
2011    AL  9.6
2012    AL  8.0
2013    AL  7.2
2014    AL  6.8
2015    AL  6.1
2004    AK  7.5
2005    AK  6.9
2006    AK  6.6
2007    AK  6.3
2008    AK  6.7
2009    AK  7.7
2010    AK  7.9
2011    AK  7.6
2012    AK  7.1
2013    AK  6.9
2014    AK  6.9
2015    AK  6.5

The reason is that I have a df that is similarly shaped and I need to merge the two dfs. I have recently had similar df shaping issues that I have been unable to find simple quick solutions to with python. Does anyone know how to solve this kind of problem?

like image 599
Collective Action Avatar asked Apr 21 '16 08:04

Collective Action


2 Answers

You can use melt:

print pd.melt(df,id_vars=['state'],var_name='year', value_name='unemployment')

   state  year  unemployment
0     AL  2004           5.7
1     AK  2004           7.5
2     AZ  2004           5.0
3     AR  2004           5.7
4     CA  2004           6.2
5     AL  2005           4.5
6     AK  2005           6.9
7     AZ  2005           4.7
8     AR  2005           5.2
9     CA  2005           5.4
10    AL  2006           4.0
11    AK  2006           6.6
12    AZ  2006           4.2
13    AR  2006           5.2
14    CA  2006           4.9
15    AL  2007           4.0
16    AK  2007           6.3
17    AZ  2007           3.9
18    AR  2007           5.3
19    CA  2007           5.4
20    AL  2008           5.7
21    AK  2008           6.7
22    AZ  2008           6.2
23    AR  2008           5.5
24    CA  2008           7.3
25    AL  2009          11.0
26    AK  2009           7.7
27    AZ  2009           9.9
28    AR  2009           7.8
29    CA  2009          11.2
30    AL  2010          10.5
31    AK  2010           7.9
32    AZ  2010          10.4
33    AR  2010           8.2
34    CA  2010          12.2
35    AL  2011           9.6
36    AK  2011           7.6
37    AZ  2011           9.5
38    AR  2011           8.3
39    CA  2011          11.7
40    AL  2012           8.0
41    AK  2012           7.1
42    AZ  2012           8.3
43    AR  2012           7.6
44    CA  2012          10.4
45    AL  2013           7.2
46    AK  2013           6.9
47    AZ  2013           7.7
48    AR  2013           7.3
49    CA  2013           8.9
50    AL  2014           6.8
51    AK  2014           6.9
52    AZ  2014           6.8
53    AR  2014           6.1
54    CA  2014           7.5
55    AL  2015           6.1
56    AK  2015           6.5
57    AZ  2015           6.1
58    AR  2015           5.2
59    CA  2015           6.2
like image 90
jezrael Avatar answered Sep 30 '22 07:09

jezrael


Besides melt, pandas also provides wide_to_long, which is "Less flexible but more user-friendly than melt." according to the docs.

# Adding prefixes here to get nice column names afterwards
df = df.add_prefix('unemployment')
df.rename(columns={'unemploymentstate': 'state'}, inplace=True)

# Reshaping to long/tidy format
pd.wide_to_long(df, stubnames='unemployment', i='state', j='year')

        unemployment
state   year    
AL  2004    5.7
AK  2004    7.5
AZ  2004    5.0
AR  2004    5.7
CA  2004    6.2
AL  2005    4.5
...
like image 29
rvf Avatar answered Sep 30 '22 07:09

rvf