So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like
location name Jan-2010 Feb-2010 March-2010 A "test" 12 20 30 B "foo" 18 20 25
What I would like is for it to look like
location name Date Value A "test" Jan-2010 12 A "test" Feb-2010 20 A "test" March-2010 30 B "foo" Jan-2010 18 B "foo" Feb-2010 20 B "foo" March-2010 25
My problem is I don't know how many dates are in the column (though I know they will always start after name)
Pandas DataFrame. transpose() is a library function that transpose index and columns. The transpose reflects the DataFrame over its main diagonal by writing rows as columns and vice-versa. Use the T attribute or the transpose() method to swap (= transpose) the rows and columns of DataFrame.
melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
UPDATE
From v0.20, melt
is a first order function, you can now use
df.melt(id_vars=["location", "name"], var_name="Date", value_name="Value") location name Date Value 0 A "test" Jan-2010 12 1 B "foo" Jan-2010 18 2 A "test" Feb-2010 20 3 B "foo" Feb-2010 20 4 A "test" March-2010 30 5 B "foo" March-2010 25
OLD(ER) VERSIONS: <0.20
You can use pd.melt
to get most of the way there, and then sort:
>>> df location name Jan-2010 Feb-2010 March-2010 0 A test 12 20 30 1 B foo 18 20 25 >>> df2 = pd.melt(df, id_vars=["location", "name"], var_name="Date", value_name="Value") >>> df2 location name Date Value 0 A test Jan-2010 12 1 B foo Jan-2010 18 2 A test Feb-2010 20 3 B foo Feb-2010 20 4 A test March-2010 30 5 B foo March-2010 25 >>> df2 = df2.sort(["location", "name"]) >>> df2 location name Date Value 0 A test Jan-2010 12 2 A test Feb-2010 20 4 A test March-2010 30 1 B foo Jan-2010 18 3 B foo Feb-2010 20 5 B foo March-2010 25
(Might want to throw in a .reset_index(drop=True)
, just to keep the output clean.)
Note: pd.DataFrame.sort
has been deprecated in favour of pd.DataFrame.sort_values
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With