Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to melt `pandas.DataFrame` with hour columns and divide them on 15 minutes intervals

I have a DataFrame something like this:

data = [['2019-01-01', .1, .2],
        ['2019-01-02', .5, .3],
        ['2019-01-03', .2, .4]]
df = pd.DataFrame(data, columns=['date', 'hour01', 'hour02'])
         date  hour01  hour02
0  2019-01-01     0.1     0.2
1  2019-01-02     0.5     0.3
2  2019-01-03     0.2     0.4

How to melt it so that I get proper 15 minutes intervals? Like this:

    timestamp              value
0  2019-01-01 00:00:00     0.1
1  2019-01-01 00:15:00     0.1
2  2019-01-01 00:30:00     0.1
3  2019-01-01 00:45:00     0.1
4  2019-01-01 01:00:00     0.2
5  2019-01-01 01:15:00     0.2
6  2019-01-01 01:30:00     0.2
7  2019-01-01 01:45:00     0.2
...
16 2019-01-03 00:00:00     0.2
17 2019-01-03 00:15:00     0.2
18 2019-01-03 00:30:00     0.2
19 2019-01-03 00:45:00     0.2
20 2019-01-03 01:00:00     0.4
21 2019-01-03 01:15:00     0.4
22 2019-01-03 01:30:00     0.4
23 2019-01-03 01:45:00     0.4

Edit

df.melt(id_vars=['timestamp'], value_vars=['hour_{}'.format(str(x).zfill(2)) for x in range(1, 24)])

gives me this:

27    2017-01-28  hour_01  34.90
28    2017-01-29  hour_01  36.04
29    2017-01-30  hour_01  36.51
          ...      ...    ...
16760 2018-12-02  hour_23  51.50
16761 2018-12-03  hour_23  54.00
16762 2018-12-04  hour_23  53.87

Where to go from here?

like image 609
miller Avatar asked Oct 22 '25 02:10

miller


1 Answers

Maybe you can do it starting with melt too, but unless using melt is by some reason a requirement, you can obtain it in this way:

  1. Make 'date' a datetime column, if not already.
  2. Using groupby and apply you can generate the timestamps for all the time intervals using pandas date_range and spanning the hourly values using numpy repeat.
  3. Finally reset the index.

Translated in a code is:

df['date'] = pd.to_datetime(df['date'])

ddf = df.groupby('date').apply(lambda row : pd.DataFrame(
      {'timestamp' : pd.date_range(row['date'].iloc[0], periods=4*len(df.columns[1:]), freq='15T'),
       'value' : np.repeat(np.array([row[col].iloc[0] for col in df.columns[1:]]), 4)}))
ddf.reset_index(inplace=True, drop=True)

Using your starting dataframe, ddf is:

             timestamp  value
0  2019-01-01 00:00:00    0.1
1  2019-01-01 00:15:00    0.1
2  2019-01-01 00:30:00    0.1
3  2019-01-01 00:45:00    0.1
4  2019-01-01 01:00:00    0.2
5  2019-01-01 01:15:00    0.2
6  2019-01-01 01:30:00    0.2
7  2019-01-01 01:45:00    0.2
8  2019-01-02 00:00:00    0.5
9  2019-01-02 00:15:00    0.5
10 2019-01-02 00:30:00    0.5
11 2019-01-02 00:45:00    0.5
12 2019-01-02 01:00:00    0.3
13 2019-01-02 01:15:00    0.3
14 2019-01-02 01:30:00    0.3
15 2019-01-02 01:45:00    0.3
16 2019-01-03 00:00:00    0.2
17 2019-01-03 00:15:00    0.2
18 2019-01-03 00:30:00    0.2
19 2019-01-03 00:45:00    0.2
20 2019-01-03 01:00:00    0.4
21 2019-01-03 01:15:00    0.4
22 2019-01-03 01:30:00    0.4
23 2019-01-03 01:45:00    0.4

This code will automatically pick how many columns you have after 'date', assuming that they are all 'hour' columns. If you have other columns mixed in the dataframe, they should be filtered out from df.columns[1:].

like image 99
Valentino Avatar answered Oct 24 '25 16:10

Valentino



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!