Convert columns into rows with Pandas

Tags:

python

pandas

So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like

location    name    Jan-2010    Feb-2010    March-2010 A           "test"  12          20          30 B           "foo"   18          20          25

What I would like is for it to look like

location    name    Date        Value A           "test"  Jan-2010    12        A           "test"  Feb-2010    20 A           "test"  March-2010  30 B           "foo"   Jan-2010    18        B           "foo"   Feb-2010    20 B           "foo"   March-2010  25

My problem is I don't know how many dates are in the column (though I know they will always start after name)

332

asked Feb 22 '15 03:02

Wizuriel

1 Answers

UPDATE
From v0.20, melt is a first order function, you can now use

df.melt(id_vars=["location", "name"],          var_name="Date",          value_name="Value")    location    name        Date  Value 0        A  "test"    Jan-2010     12 1        B   "foo"    Jan-2010     18 2        A  "test"    Feb-2010     20 3        B   "foo"    Feb-2010     20 4        A  "test"  March-2010     30 5        B   "foo"  March-2010     25

OLD(ER) VERSIONS: <0.20

You can use pd.melt to get most of the way there, and then sort:

>>> df   location  name  Jan-2010  Feb-2010  March-2010 0        A  test        12        20          30 1        B   foo        18        20          25 >>> df2 = pd.melt(df, id_vars=["location", "name"],                    var_name="Date", value_name="Value") >>> df2   location  name        Date  Value 0        A  test    Jan-2010     12 1        B   foo    Jan-2010     18 2        A  test    Feb-2010     20 3        B   foo    Feb-2010     20 4        A  test  March-2010     30 5        B   foo  March-2010     25 >>> df2 = df2.sort(["location", "name"]) >>> df2   location  name        Date  Value 0        A  test    Jan-2010     12 2        A  test    Feb-2010     20 4        A  test  March-2010     30 1        B   foo    Jan-2010     18 3        B   foo    Feb-2010     20 5        B   foo  March-2010     25

(Might want to throw in a .reset_index(drop=True), just to keep the output clean.)

Note: pd.DataFrame.sort has been deprecated in favour of pd.DataFrame.sort_values.

182

answered Oct 06 '22 14:10

DSM

Related questions
                            
                                'str' object does not support item assignment
                            
                                Seeking clarification on apparent contradictions regarding weakly typed languages
                            
                                What are the differences between the threading and multiprocessing modules?
                            
                                Numpy where function multiple conditions
                            
                                What's a standard way to do a no-op in python?
                            
                                Any way to clear python's IDLE window?
                            
                                type object 'datetime.datetime' has no attribute 'datetime'
                            
                                How do you use the ellipsis slicing syntax in Python?
                            
                                How do you create nested dict in Python?
                            
                                Basic http file downloading and saving to disk in python?
                            
                                Counting the number of True Booleans in a Python List
                            
                                How to get item's position in a list?
                            
                                Does Python SciPy need BLAS?
                            
                                setting an environment variable in virtualenv
                            
                                Slicing of a NumPy 2d array, or how do I extract an mxm submatrix from an nxn array (n>m)?
                            
                                Let JSON object accept bytes or let urlopen output strings
                            
                                How to print a groupby object
                            
                                numpy.where() detailed, step-by-step explanation / examples [closed]
                            
                                How to activate an Anaconda environment
                            
                                Set Colorbar Range in matplotlib

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With