How to rearrange a python pandas dataframe?

Tags:

I have the following dataframe read in from a .csv file with the "Date" column being the index. The days are in the rows and the columns show the values for the hours that day.

> Date           h1 h2  h3  h4 ... h24
> 14.03.2013    60  50  52  49 ... 73

I would like to arrange it like this, so that there is one index column with the date/time and one column with the values in a sequence

>Date/Time            Value
>14.03.2013 00:00:00  60
>14.03.2013 01:00:00  50
>14.03.2013 02:00:00  52
>14.03.2013 03:00:00  49
>.
>.
>.
>14.03.2013 23:00:00  73

I was trying it by using two loops to go through the dataframe. Is there an easier way to do this in pandas?

288

asked Mar 15 '13 12:03

2 Answers

I'm not the best at date manipulations, but maybe something like this:

import pandas as pd
from datetime import timedelta

df = pd.read_csv("hourmelt.csv", sep=r"\s+")

df = pd.melt(df, id_vars=["Date"])
df = df.rename(columns={'variable': 'hour'})
df['hour'] = df['hour'].apply(lambda x: int(x.lstrip('h'))-1)

combined = df.apply(lambda x: 
                    pd.to_datetime(x['Date'], dayfirst=True) + 
                    timedelta(hours=int(x['hour'])), axis=1)

df['Date'] = combined
del df['hour']

df = df.sort("Date")

Some explanation follows.

Starting from

>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> 
>>> df = pd.read_csv("hourmelt.csv", sep=r"\s+")
>>> df
         Date  h1  h2  h3  h4  h24
0  14.03.2013  60  50  52  49   73
1  14.04.2013   5   6   7   8    9

We can use pd.melt to make the hour columns into one column with that value:

>>> df = pd.melt(df, id_vars=["Date"])
>>> df = df.rename(columns={'variable': 'hour'})
>>> df
         Date hour  value
0  14.03.2013   h1     60
1  14.04.2013   h1      5
2  14.03.2013   h2     50
3  14.04.2013   h2      6
4  14.03.2013   h3     52
5  14.04.2013   h3      7
6  14.03.2013   h4     49
7  14.04.2013   h4      8
8  14.03.2013  h24     73
9  14.04.2013  h24      9

Get rid of those hs:

>>> df['hour'] = df['hour'].apply(lambda x: int(x.lstrip('h'))-1)
>>> df
         Date  hour  value
0  14.03.2013     0     60
1  14.04.2013     0      5
2  14.03.2013     1     50
3  14.04.2013     1      6
4  14.03.2013     2     52
5  14.04.2013     2      7
6  14.03.2013     3     49
7  14.04.2013     3      8
8  14.03.2013    23     73
9  14.04.2013    23      9

Combine the two columns as a date:

>>> combined = df.apply(lambda x: pd.to_datetime(x['Date'], dayfirst=True) + timedelta(hours=int(x['hour'])), axis=1)
>>> combined
0    2013-03-14 00:00:00
1    2013-04-14 00:00:00
2    2013-03-14 01:00:00
3    2013-04-14 01:00:00
4    2013-03-14 02:00:00
5    2013-04-14 02:00:00
6    2013-03-14 03:00:00
7    2013-04-14 03:00:00
8    2013-03-14 23:00:00
9    2013-04-14 23:00:00

Reassemble and clean up:

>>> df['Date'] = combined
>>> del df['hour']
>>> df = df.sort("Date")
>>> df
                 Date  value
0 2013-03-14 00:00:00     60
2 2013-03-14 01:00:00     50
4 2013-03-14 02:00:00     52
6 2013-03-14 03:00:00     49
8 2013-03-14 23:00:00     73
1 2013-04-14 00:00:00      5
3 2013-04-14 01:00:00      6
5 2013-04-14 02:00:00      7
7 2013-04-14 03:00:00      8
9 2013-04-14 23:00:00      9

answered Sep 28 '22 18:09

DSM

You could always grab the hourly data_array and flatten it. You would generate a new DatetimeIndex with hourly freq.

df = df.asfreq('D')
hourly_data = df.values[:, :]
new_ind = pd.date_range(start=df.index[0], freq="H", periods=len(df) * 24)
# create Series.
s = pd.Series(hourly_data.flatten(), index=new_ind)

I'm assuming that read_csv is parsing the 'Date' column and making it the index. We change to frequency of 'D' so that the new_ind lines up correctly if you have missing days. The missing days will be filled with np.nan which you can drop with s.dropna().

notebook link

answered Sep 28 '22 17:09

Dale Jung

Related questions
                            
                                Problem with class based generic views in Django
                            
                                how to put gap between y axis and first bar in vertical barchart matplotlib
                            
                                Scientific computing in Python for MATLAB programmers
                            
                                python metaclasses at module level
                            
                                How do I get a set of grammar rules from Penn Treebank using python & NLTK?
                            
                                What does the python re.template function do?
                            
                                Mocking file objects or iterables in python
                            
                                Python tools/libraries for Semantic Web: state of the art? [closed]
                            
                                Python Importing at function level VS. Module level
                            
                                WebSocket + Django python WebService
                            
                                urllib2 - post request
                            
                                Why does pip installs a package outside my virtual environment?
                            
                                Python - how does passing values work?
                            
                                Why does os.path.join throw away arguments?
                            
                                Unit testing a function that returns a generator object
                            
                                beautiful soup get children that are Tags (not Navigable Strings) from a Tag
                            
                                Multi-tenancy with SQLAlchemy
                            
                                Plotting dates with sharex=True leads to ValueError: ordinal must be >= 1
                            
                                rasterizing matplotlib axis contents (but not frame, labels)
                            
                                Python's function readlines(n) behavior

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to rearrange a python pandas dataframe?

Tags:

python

pandas

dataframe

sequence

row

Markus W

People also ask

2 Answers

DSM

Dale Jung

Recent Activity

Donate For Us