I have a dataset with the exact time and quantity of e-vehicles charging at a charging station and its charging power in kW. What I am trying to do is to calculate how many e-vehicles are charging at the exact same time (taking into account <code>start</code> and <code>end</code> times), and according to that, calculate the required charging power (sum up the <code>power_kW</code>) needed for each hour. For example, from <code>18-19h</code> I need only 100kW of power. Below I shared a part of the data, I obviously have the data for all the 24 hours, and for each hour I should have a total needed charging station power in kW. Obviously, if one vehicle finishes charging at 20:50, and another one starts at 20:50 - they overlap and I need to count <code>2*power_kW</code>. Also, in some slots, I have more than 1 vehicle starting and finishing at the same time. I find it hard to do it fast and automatically with the code, given that the times are in <code>hour:minute</code> format, so I seek help here. I checked similar posts and couldn't find anything to bring me to a solution on my own. Does someone have an idea how to do this efficently and correctly? Input data: <pre class="prettyprint lang-py prettyprint-override"><code>import pandas as pd mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count'] data = [['18H-19H', '18:26', '20:27', 100, 1], ['19H-20H', '19:06', '21:16', 100, 1], ['19H-20H', '19:19', '21:16', 100, 1], ['19H-20H', '19:36', '21:33', 100, 1], ['19H-20H', '19:54', '20:19', 100, 2], ['19H-20H', '19:55', '22:01', 100, 1], ['20H-21H', '20:04', '22:06', 100, 1], ['20H-21H', '20:11', '22:04', 100, 2], ['20H-21H', '20:15', '22:04', 100, 1], ['20H-21H', '20:16', '22:04', 100, 1], ['20H-21H', '20:17', '22:08', 100, 1], ['20H-21H', '20:19', '22:09', 100, 1], ['20H-21H', '20:20', '22:01', 100, 2], ['20H-21H', '20:22', '22:35', 100, 1], ['20H-21H', '20:28', '22:34', 100, 2], ['20H-21H', '20:29', '22:22', 100, 1], ['20H-21H', '20:30', '22:14', 100, 1], ['20H-21H', '20:31', '22:10', 100, 1], ['20H-21H', '20:37', '22:31', 100, 1], ['20H-21H', '20:41', '22:29', 100, 2], ['20H-21H', '20:43', '22:34', 100, 1], ['20H-21H', '20:46', '22:39', 100, 1], ['20H-21H', '20:47', '22:35', 100, 1], ['20H-21H', '20:52', '22:34', 100, 1], ['20H-21H', '20:52', '23:09', 100, 1], ['20H-21H', '20:53', '21:59', 100, 1], ['20H-21H', '20:54', '21:49', 100, 1], ['20H-21H', '20:56', '22:10', 100, 1], ['20H-21H', '20:56', '22:55', 100, 1], ['21H-22H', '21:03', '22:51', 100, 1], ['21H-22H', '21:05', '23:12', 100, 1], ['21H-22H', '21:08', '22:59', 100, 1], ['21H-22H', '21:10', '23:27', 100, 1], ['21H-22H', '21:10', '23:30', 100, 1], ['21H-22H', '21:15', '23:23', 100, 1], ['21H-22H', '21:19', '21:56', 100, 1], ['21H-22H', '21:21', '22:48', 100, 1], ['21H-22H', '21:25', '23:26', 100, 1], ['21H-22H', '21:25', '23:32', 100, 1], ['21H-22H', '21:27', '22:55', 100, 1], ['21H-22H', '21:27', '23:32', 100, 1], ['21H-22H', '21:33', '23:11', 100, 1], ['21H-22H', '21:37', '23:04', 100, 1], ['21H-22H', '21:39', '00:05', 100, 1], ['21H-22H', '21:40', '23:08', 100, 1], ['21H-22H', '21:45', '23:04', 100, 1], ['21H-22H', '21:49', '00:06', 100, 1], ['21H-22H', '21:54', '00:07', 100, 1], ['21H-22H', '21:58', '00:02', 100, 1], ['21H-22H', '21:58', '00:24', 100, 1], ['22H-23H', '22:10', '00:19', 100, 1]] df = pd.DataFrame(data, columns=mycolumns) df start_timeslot start end power_kW vehicle_count 0 18H-19H 18:26 20:27 100 1 1 19H-20H 19:06 21:16 100 1 2 19H-20H 19:19 21:16 100 1 3 19H-20H 19:36 21:33 100 1 4 19H-20H 19:54 20:19 100 2 5 19H-20H 19:55 22:01 100 1 6 20H-21H 20:04 22:06 100 1 7 20H-21H 20:11 22:04 100 2 8 20H-21H 20:15 22:04 100 1 9 20H-21H 20:16 22:04 100 1 10 20H-21H 20:17 22:08 100 1 11 20H-21H 20:19 22:09 100 1 12 20H-21H 20:20 22:01 100 2 13 20H-21H 20:22 22:35 100 1 14 20H-21H 20:28 22:34 100 2 15 20H-21H 20:29 22:22 100 1 16 20H-21H 20:30 22:14 100 1 17 20H-21H 20:31 22:10 100 1 18 20H-21H 20:37 22:31 100 1 19 20H-21H 20:41 22:29 100 2 20 20H-21H 20:43 22:34 100 1 21 20H-21H 20:46 22:39 100 1 22 20H-21H 20:47 22:35 100 1 23 20H-21H 20:52 22:34 100 1 24 20H-21H 20:52 23:09 100 1 25 20H-21H 20:53 21:59 100 1 26 20H-21H 20:54 21:49 100 1 27 20H-21H 20:56 22:10 100 1 28 20H-21H 20:56 22:55 100 1 29 21H-22H 21:03 22:51 100 1 30 21H-22H 21:05 23:12 100 1 31 21H-22H 21:08 22:59 100 1 32 21H-22H 21:10 23:27 100 1 33 21H-22H 21:10 23:30 100 1 34 21H-22H 21:15 23:23 100 1 35 21H-22H 21:19 21:56 100 1 36 21H-22H 21:21 22:48 100 1 37 21H-22H 21:25 23:26 100 1 38 21H-22H 21:25 23:32 100 1 39 21H-22H 21:27 22:55 100 1 40 21H-22H 21:27 23:32 100 1 41 21H-22H 21:33 23:11 100 1 42 21H-22H 21:37 23:04 100 1 43 21H-22H 21:39 00:05 100 1 44 21H-22H 21:40 23:08 100 1 45 21H-22H 21:45 23:04 100 1 46 21H-22H 21:49 00:06 100 1 47 21H-22H 21:54 00:07 100 1 48 21H-22H 21:58 00:02 100 1 49 21H-22H 21:58 00:24 100 1 50 22H-23H 22:10 00:19 100 1 </code></pre> Expected output should be: <pre class="prettyprint"><code> start_timeslot needed_power_kW 0 18H-19H 100 1 19H-20H 700 2 20H-21H 3100 3 21H-22H 4600 4 22H-23H 4600 5 23H-00H 1800 6 00H-01H 600 </code></pre> <hr> I am adding an example from the real dataset, because with the proposed solutions I cannot get the correct answer for all options. <pre class="prettyprint"><code>mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count'] data = [['00H-01H', '00:05', '00:09', 200.0, 1], ['00H-01H', '00:35', '00:39', 200.0, 1], ['01H-02H', '01:05', '01:09', 200.0, 1], ['05H-06H', '05:34', '05:41', 200.0, 1], ['05H-06H', '05:54', '06:01', 200.0, 1], ['06H-07H', '06:20', '06:27', 200.0, 1], ['06H-07H', '06:44', '06:47', 200.0, 1], ['06H-07H', '06:59', '07:06', 200.0, 1], ['07H-08H', '07:18', '07:22', 200.0, 1], ['07H-08H', '07:36', '07:40', 200.0, 1], ['07H-08H', '07:49', '07:56', 200.0, 1], ['08H-09H', '08:01', '08:05', 200.0, 1], ['08H-09H', '08:08', '08:14', 200.0, 1], ['08H-09H', '08:14', '08:20', 200.0, 1], ['08H-09H', '08:21', '08:26', 200.0, 1], ['08H-09H', '08:28', '08:35', 200.0, 1], ['08H-09H', '08:35', '08:42', 200.0, 1], ['08H-09H', '08:42', '08:46', 200.0, 1], ['08H-09H', '08:49', '08:56', 200.0, 1], ['08H-09H', '08:55', '09:02', 200.0, 1], ['08H-09H', '09:00', '09:04', 200.0, 1], ['09H-10H', '09:07', '09:13', 200.0, 1], ['09H-10H', '09:13', '09:20', 200.0, 1], ['09H-10H', '09:25', '09:29', 200.0, 1], ['09H-10H', '09:31', '09:36', 200.0, 1], ['09H-10H', '09:37', '09:42', 200.0, 1], ['09H-10H', '09:43', '09:47', 200.0, 1], ['09H-10H', '09:49', '09:56', 200.0, 1], ['09H-10H', '09:55', '10:02', 200.0, 1], ['10H-11H', '10:01', '10:05', 200.0, 1], ['10H-11H', '10:06', '10:13', 200.0, 1], ['10H-11H', '10:12', '10:16', 200.0, 1], ['10H-11H', '10:18', '10:23', 200.0, 1], ['10H-11H', '10:23', '10:27', 200.0, 1], ['10H-11H', '10:29', '10:33', 200.0, 1], ['10H-11H', '10:35', '10:39', 200.0, 1], ['10H-11H', '10:41', '10:45', 200.0, 1], ['10H-11H', '10:47', '10:52', 200.0, 1], ['10H-11H', '10:53', '10:57', 200.0, 1], ['10H-11H', '10:59', '11:03', 200.0, 1], ['11H-12H', '11:05', '11:09', 200.0, 1], ['11H-12H', '11:11', '11:15', 200.0, 1], ['11H-12H', '11:17', '11:21', 200.0, 1], ['11H-12H', '11:23', '11:27', 200.0, 1], ['11H-12H', '11:29', '11:33', 200.0, 1], ['11H-12H', '11:35', '11:39', 200.0, 1], ['11H-12H', '11:41', '11:46', 200.0, 1], ['11H-12H', '11:47', '11:51', 200.0, 1], ['11H-12H', '11:53', '11:57', 200.0, 1], ['11H-12H', '11:59', '12:03', 200.0, 1], ['12H-13H', '12:05', '12:09', 200.0, 1], ['12H-13H', '12:11', '12:15', 200.0, 1], ['12H-13H', '12:17', '12:21', 200.0, 1], ['12H-13H', '12:22', '12:26', 200.0, 1], ['12H-13H', '12:28', '12:33', 200.0, 1], ['12H-13H', '12:33', '12:37', 200.0, 1], ['12H-13H', '12:38', '12:43', 200.0, 1], ['12H-13H', '12:43', '12:47', 200.0, 1], ['12H-13H', '12:49', '12:53', 200.0, 1], ['12H-13H', '12:55', '12:59', 200.0, 1], ['13H-14H', '13:01', '13:05', 200.0, 1], ['13H-14H', '13:07', '13:12', 200.0, 1], ['13H-14H', '13:13', '13:17', 200.0, 1], ['13H-14H', '13:19', '13:23', 200.0, 1], ['13H-14H', '13:25', '13:29', 200.0, 1], ['13H-14H', '13:31', '13:35', 200.0, 1], ['13H-14H', '13:37', '13:41', 200.0, 1], ['13H-14H', '13:43', '13:47', 200.0, 1], ['13H-14H', '13:49', '13:53', 200.0, 1], ['13H-14H', '13:55', '13:59', 200.0, 1], ['14H-15H', '14:01', '14:06', 200.0, 1], ['14H-15H', '14:07', '14:11', 200.0, 1], ['14H-15H', '14:13', '14:17', 200.0, 1], ['14H-15H', '14:19', '14:23', 200.0, 1], ['14H-15H', '14:25', '14:29', 200.0, 1], ['14H-15H', '14:31', '14:35', 200.0, 1], ['14H-15H', '14:37', '14:41', 200.0, 1], ['14H-15H', '14:42', '14:46', 200.0, 1], ['14H-15H', '14:48', '14:52', 200.0, 1], ['14H-15H', '14:54', '14:58', 200.0, 1], ['14H-15H', '14:59', '15:03', 200.0, 1], ['15H-16H', '15:04', '15:08', 200.0, 1], ['15H-16H', '15:09', '15:13', 200.0, 1], ['15H-16H', '15:15', '15:19', 200.0, 1], ['15H-16H', '15:21', '15:25', 200.0, 1], ['15H-16H', '15:27', '15:31', 200.0, 1], ['15H-16H', '15:33', '15:37', 200.0, 1], ['15H-16H', '15:39', '15:43', 200.0, 1], ['15H-16H', '15:45', '15:49', 200.0, 1], ['15H-16H', '15:51', '15:55', 200.0, 1], ['15H-16H', '15:57', '16:01', 200.0, 1], ['16H-17H', '16:03', '16:07', 200.0, 1], ['16H-17H', '16:09', '16:13', 200.0, 1], ['16H-17H', '16:15', '16:19', 200.0, 1], ['16H-17H', '16:21', '16:27', 200.0, 1], ['16H-17H', '16:27', '16:31', 200.0, 1], ['16H-17H', '16:33', '16:37', 200.0, 1], ['16H-17H', '16:39', '16:43', 200.0, 1], ['16H-17H', '16:45', '16:49', 200.0, 1], ['16H-17H', '16:51', '16:55', 200.0, 1], ['16H-17H', '16:57', '17:01', 200.0, 1], ['17H-18H', '17:04', '17:08', 200.0, 1], ['17H-18H', '17:10', '17:14', 200.0, 1], ['17H-18H', '17:16', '17:20', 200.0, 1], ['17H-18H', '17:22', '17:26', 200.0, 1], ['17H-18H', '17:28', '17:32', 200.0, 1], ['17H-18H', '17:34', '17:38', 200.0, 1], ['17H-18H', '17:40', '17:44', 200.0, 1], ['17H-18H', '17:46', '17:50', 200.0, 1], ['17H-18H', '17:51', '17:55', 200.0, 1], ['17H-18H', '17:57', '18:01', 200.0, 1], ['18H-19H', '18:03', '18:07', 200.0, 1], ['18H-19H', '18:09', '18:13', 200.0, 1], ['18H-19H', '18:15', '18:19', 200.0, 1], ['18H-19H', '18:21', '18:25', 200.0, 1], ['18H-19H', '18:27', '18:31', 200.0, 1], ['18H-19H', '18:33', '18:37', 200.0, 1], ['18H-19H', '18:39', '18:43', 200.0, 1], ['18H-19H', '18:45', '18:49', 200.0, 1], ['18H-19H', '18:51', '18:55', 200.0, 1], ['18H-19H', '18:57', '19:01', 200.0, 1], ['19H-20H', '19:04', '19:08', 200.0, 1], ['19H-20H', '19:11', '19:15', 200.0, 1], ['19H-20H', '19:18', '19:22', 200.0, 1], ['19H-20H', '19:25', '19:29', 200.0, 1], ['19H-20H', '19:32', '19:36', 200.0, 1], ['19H-20H', '19:39', '19:43', 200.0, 1], ['19H-20H', '19:46', '19:50', 200.0, 1], ['19H-20H', '19:52', '19:56', 200.0, 1], ['20H-21H', '20:07', '20:11', 200.0, 1], ['20H-21H', '20:13', '20:17', 200.0, 1], ['20H-21H', '20:20', '20:24', 200.0, 1], ['20H-21H', '20:28', '20:32', 200.0, 1], ['20H-21H', '20:35', '20:39', 200.0, 1], ['20H-21H', '20:42', '20:46', 200.0, 1], ['20H-21H', '20:49', '20:53', 200.0, 1], ['20H-21H', '20:55', '20:59', 200.0, 1], ['21H-22H', '21:10', '21:14', 200.0, 1], ['21H-22H', '21:23', '21:24', 200.0, 1], ['21H-22H', '21:39', '21:43', 200.0, 1], ['22H-23H', '22:08', '22:12', 200.0, 1], ['22H-23H', '22:38', '22:42', 200.0, 1], ['23H-00H', '23:08', '23:12', 200.0, 1], ['23H-00H', '23:36', '23:40', 200.0, 1]] df = pd.DataFrame(data, columns=mycolumns) </code></pre>

There are two main steps to this solution. First, you need to create a new column that you can group over. Simply using <code>start_timeslot</code> won't work as rows in earlier timeslots that have not stopped charging can still affect later <code>start_timeslot</code> groups. New groups can be created using the <code>start</code> and <code>end</code> values together with <code>date_range</code>. Note that extra care must be taken when the end value is in a new day. <pre class="prettyprint"><code>df['start'] = pd.to_datetime(df['start']) df['end'] = pd.to_datetime(df['end']) df.loc[df['start'] > df['end'], 'end'] = df['end'] + pd.DateOffset(1) df['hours'] = df.apply(lambda row: pd.date_range(row['start'].floor('H'), row['end'].floor('H'), freq='H').hour, axis=1) </code></pre> Intermediate result: <pre class="prettyprint"><code> start_timeslot start end power_kW vehicle_count hours 0 18H-19H 2021-11-18 18:00:00 2021-11-18 20:00:00 100 1 Int64Index([18, 19, 20], dtype='int64') 1 19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00 100 1 Int64Index([19, 20, 21], dtype='int64') 2 19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00 100 1 Int64Index([19, 20, 21], dtype='int64') 3 19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00 100 1 Int64Index([19, 20, 21], dtype='int64') 4 19H-20H 2021-11-18 19:00:00 2021-11-18 20:00:00 100 2 Int64Index([19, 20], dtype='int64') 5 19H-20H 2021-11-18 19:00:00 2021-11-18 22:00:00 100 1 Int64Index([19, 20, 21, 22], dtype='int64') </code></pre> Now, we can <code>explode</code> the new column and then group by it. The second step is to compute the maximum required power for a group. We can simply take the start and end times and sort them. We consider the start time as positive (we add the required power) and end times as negative (subtract the power). Next, we sort by time and then use <code>cumsum</code> on the series to get the cumulative sum and then take the maximum value of this. This will be the maximum required power for a group. Code: <pre class="prettyprint"><code>def func(df): df['power'] = df['power_kW'] * df['vehicle_count'] pos = df.set_index('start')['power'] neg = df.set_index('end')['power'] * -1 df_merge = pd.concat([pos, neg]).reset_index() maximum = df_merge.sort_values(['index', 'power'], ascending=[True, False])['power'].cumsum().max() return maximum res = df.explode('hours').groupby('hours').apply(func) </code></pre> Note that if there is a car that stops charging at e.g. 10:35 and another one that starts at 10:35, this will be considered an overlap with the above code. If these should not overlap then change the sorting to use <code>ascending=[True, True]</code>. Result: <pre class="prettyprint"><code>hours 0 600 18 100 19 700 20 3100 21 4700 22 4600 23 1800 dtype: int64 </code></pre> The <code>start_timeslot</code> column is gone so some final postprocessing to obtain the expected result is required: <pre class="prettyprint"><code>res = res.to_frame('needed_power_kW').reset_index().rename(columns={'hours': 'start_timeslot'}) res['end_hour'] = (pd.to_datetime(res['start_timeslot'], unit='h') + pd.DateOffset(hours=1)).dt.hour res['start_timeslot'] = res['start_timeslot'].astype(str).str.zfill(2) + 'H-' + res['end_hour'].astype(str).str.zfill(2) + 'H' res = res.drop('end_hour', axis=1) </code></pre> Final result: <pre class="prettyprint"><code> start_timeslot needed_power_kW 0 00H-01H 600 1 18H-19H 100 2 19H-20H 700 3 20H-21H 3100 4 21H-22H 4600 5 22H-23H 4600 6 23H-00H 1800 </code></pre> Using the additional data: <pre class="prettyprint"><code> start_timeslot needed_power_kW 0 00H-01H 200.0 1 01H-02H 200.0 2 05H-06H 200.0 3 06H-07H 200.0 4 07H-08H 200.0 5 08H-09H 400.0 6 09H-10H 400.0 7 10H-11H 400.0 8 11H-12H 200.0 9 12H-13H 400.0 10 13H-14H 200.0 11 14H-15H 200.0 12 15H-16H 200.0 13 16H-17H 400.0 14 17H-18H 200.0 15 18H-19H 200.0 16 19H-20H 200.0 17 20H-21H 200.0 18 21H-22H 200.0 19 22H-23H 200.0 20 23H-00H 200.0 </code></pre> Note: by changing the sorting as described above, the <code>needed_power_kW</code> for both <code>12H-13H</code> and <code>16H-17H</code> will be 200.

I would propose this approach. First, change all the times to minutes per day. (If you have a larger dataset over days, you could adapt the code below using <code>datetime</code> objects with the appropriate resolution, e.g. seconds.) That is done using the following function <pre class="prettyprint"><code>import numpy as np import pandas as pd import matplotlib.pyplot as plt def toMinutes(timeString): t = [int(i) for i in timeString.split(':')] t = 60 * t[0] + t[1] return(t) </code></pre> Next, I would iterate through your dataset noting the times where there were increases and decreases in usage along with the amount of power. You don't explicitly mention tracking the number of vehicles in your question but I am going to track that as well as power since they are pretty much the same from a counting point of view. <pre class="prettyprint"><code>def add_row(timeString, power, nVehicle, arrive = True): t = toMinutes(timeString) delta = power * nVehicle if arrive: return([t / 60, delta, nVehicle]) else: return([t / 60, -delta, - nVehicle]) </code></pre> Running through your <code>dataframe</code> we can create a list of times where vehicles arrived or left and monitor the resulting change in power consumption. So each entry in your <code>dataframe</code> results in two entries in the list below, one when the vehicle arrives and one when it leaves. You stated in your question that a vehicle arriving at 20:50 and another leaving at 20:50 had to count as two vehicles at 20:50, so do all the arrivals first and then all the departures. That way when you sort below, it will count the arrivals before the departures. <pre class="prettyprint"><code> x = [[0, 0, 0],] # initialize list (mostly for plotting, you can skip this if not plotting) x += [add_row(t, P, n) \ for t, P, n in zip(df['start'], df['power_kW'], df['vehicle_count'])] x += [add_row(t, P, n, arrive = False) \ for t, P, n in zip(df['end'], df['power_kW'], df['vehicle_count'])] </code></pre> Now sort <code>x</code> by the time column so we have an ordered time series of discrete events that we care about. The power consumption and number of vehicles is just the sum of past events in our dataset. <pre class="prettyprint"><code>x = np.array(sorted(x)) t = x[:,0] P = np.cumsum(x[:,1]) N = np.cumsum(x[:,2]) </code></pre> I like plotting things so here is a script to make sure the code is doing what we want so far. <pre class="prettyprint"><code>def plot_events(t, P, n): Nevent = len(P) to = [] Po = [] No = [] for i in range(1, Nevent): to.append(t[i]) to.append(t[i]) Po.append(P[i-1]) Po.append(P[i]) No.append(N[i-1]) No.append(N[i]) fig, ax = plt.subplots(figsize = (6,8), nrows=2) ax[0].plot(to, Po) ax[0].set_xlabel('Time in Hours') ax[0].set_ylabel('Power in kW') ax[1].plot(to, No) ax[1].set_xlabel('Time in Hours') ax[1].set_ylabel('Number of vehicles charging') return(fig, ax) fig, ax = plot_events(t, P, N) </code></pre> <img src="https://i.stack.imgur.com/aTGaU.png" alt="events"> Now to bin the events in time, we just look at the maximum demand in each time window of interest. <pre class="prettyprint"><code>def bin_events(t, y, dt = 1, mint = 0, maxt = 24): # find max value of y in t bins of size dt x = np.arange(mint, maxt, dt) z = np.zeros(x.shape) for i in range(len(x)): try: z[i] = np.max(y[(t >= x[i]) & (t < x[i] + dt)]) except ValueError: pass return(z) maxP = bin_events(t, P) maxN = bin_events(t, N) </code></pre> And again, plot to make sure we're doing what we want: <pre class="prettyprint"><code>ax[0].bar(np.arange(0,24) + 0.5, maxP, color = 'k', alpha = 0.3, width = 1) ax[1].bar(np.arange(0,24) + 0.5, maxN, color = 'k', alpha = 0.3, width = 1) </code></pre> <img src="https://i.stack.imgur.com/EJmk9.png" alt="events with max values"> Finally, put the new info to a <code>dataframe</code> or add it to your existing one: <pre class="prettyprint"><code>col = 'start_timeslot' choices = [f"{i:02}H-{(i+1)%24:02}H" for i in range(24)] conditions = [eval(f"df[col] == '{c}'") for c in choices] summary_data_df = pd.DataFrame(np.array([choices, maxP, maxN]).transpose(), columns=['start_timeslot', 'max_power_kW', 'max_vehicles']) df["max_power_kW"] = np.select(conditions, maxP, default=0) df["max_vehicles"] = np.select(conditions, maxN, default=0) </code></pre> <pre class="prettyprint"><code> start_timeslot max_power_kW max_vehicles 0 00H-01H 200.0 1.0 1 01H-02H 200.0 1.0 2 02H-03H 0.0 0.0 3 03H-04H 0.0 0.0 4 04H-05H 0.0 0.0 5 05H-06H 200.0 1.0 6 06H-07H 200.0 1.0 7 07H-08H 200.0 1.0 8 08H-09H 400.0 2.0 9 09H-10H 400.0 2.0 10 10H-11H 400.0 2.0 11 11H-12H 200.0 1.0 12 12H-13H 200.0 1.0 13 13H-14H 200.0 1.0 14 14H-15H 200.0 1.0 15 15H-16H 200.0 1.0 16 16H-17H 200.0 1.0 17 17H-18H 200.0 1.0 18 18H-19H 200.0 1.0 19 19H-20H 200.0 1.0 20 20H-21H 200.0 1.0 21 21H-22H 200.0 1.0 22 22H-23H 200.0 1.0 23 23H-00H 200.0 1.0 </code></pre>

Counting the occurence based on overlapping intervals (HH:MM) in Pandas

Tags:

python

pandas

I have a dataset with the exact time and quantity of e-vehicles charging at a charging station and its charging power in kW.

What I am trying to do is to calculate how many e-vehicles are charging at the exact same time (taking into account start and end times), and according to that, calculate the required charging power (sum up the power_kW) needed for each hour. For example, from 18-19h I need only 100kW of power.

Below I shared a part of the data, I obviously have the data for all the 24 hours, and for each hour I should have a total needed charging station power in kW. Obviously, if one vehicle finishes charging at 20:50, and another one starts at 20:50 - they overlap and I need to count 2*power_kW. Also, in some slots, I have more than 1 vehicle starting and finishing at the same time.

I find it hard to do it fast and automatically with the code, given that the times are in hour:minute format, so I seek help here. I checked similar posts and couldn't find anything to bring me to a solution on my own.

Does someone have an idea how to do this efficently and correctly?

Input data:

import pandas as pd
mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count']
data = [['18H-19H', '18:26', '20:27', 100, 1], ['19H-20H', '19:06', '21:16', 100, 1], 
        ['19H-20H', '19:19', '21:16', 100, 1], ['19H-20H', '19:36', '21:33', 100, 1], 
        ['19H-20H', '19:54', '20:19', 100, 2], ['19H-20H', '19:55', '22:01', 100, 1],
        ['20H-21H', '20:04', '22:06', 100, 1], ['20H-21H', '20:11', '22:04', 100, 2], 
        ['20H-21H', '20:15', '22:04', 100, 1], ['20H-21H', '20:16', '22:04', 100, 1], 
        ['20H-21H', '20:17', '22:08', 100, 1], ['20H-21H', '20:19', '22:09', 100, 1],
        ['20H-21H', '20:20', '22:01', 100, 2], ['20H-21H', '20:22', '22:35', 100, 1], 
        ['20H-21H', '20:28', '22:34', 100, 2], ['20H-21H', '20:29', '22:22', 100, 1], 
        ['20H-21H', '20:30', '22:14', 100, 1], ['20H-21H', '20:31', '22:10', 100, 1],
        ['20H-21H', '20:37', '22:31', 100, 1], ['20H-21H', '20:41', '22:29', 100, 2], 
        ['20H-21H', '20:43', '22:34', 100, 1], ['20H-21H', '20:46', '22:39', 100, 1], 
        ['20H-21H', '20:47', '22:35', 100, 1], ['20H-21H', '20:52', '22:34', 100, 1],
        ['20H-21H', '20:52', '23:09', 100, 1], ['20H-21H', '20:53', '21:59', 100, 1], 
        ['20H-21H', '20:54', '21:49', 100, 1], ['20H-21H', '20:56', '22:10', 100, 1], 
        ['20H-21H', '20:56', '22:55', 100, 1], ['21H-22H', '21:03', '22:51', 100, 1],
        ['21H-22H', '21:05', '23:12', 100, 1], ['21H-22H', '21:08', '22:59', 100, 1], 
        ['21H-22H', '21:10', '23:27', 100, 1], ['21H-22H', '21:10', '23:30', 100, 1], 
        ['21H-22H', '21:15', '23:23', 100, 1], ['21H-22H', '21:19', '21:56', 100, 1],
        ['21H-22H', '21:21', '22:48', 100, 1], ['21H-22H', '21:25', '23:26', 100, 1], 
        ['21H-22H', '21:25', '23:32', 100, 1], ['21H-22H', '21:27', '22:55', 100, 1], 
        ['21H-22H', '21:27', '23:32', 100, 1], ['21H-22H', '21:33', '23:11', 100, 1],
        ['21H-22H', '21:37', '23:04', 100, 1], ['21H-22H', '21:39', '00:05', 100, 1], 
        ['21H-22H', '21:40', '23:08', 100, 1], ['21H-22H', '21:45', '23:04', 100, 1], 
        ['21H-22H', '21:49', '00:06', 100, 1], ['21H-22H', '21:54', '00:07', 100, 1],
        ['21H-22H', '21:58', '00:02', 100, 1], ['21H-22H', '21:58', '00:24', 100, 1], 
        ['22H-23H', '22:10', '00:19', 100, 1]]
        
df = pd.DataFrame(data, columns=mycolumns)

df
   start_timeslot   start   end     power_kW   vehicle_count
0   18H-19H        18:26    20:27   100         1
1   19H-20H        19:06    21:16   100         1
2   19H-20H        19:19    21:16   100         1
3   19H-20H        19:36    21:33   100         1
4   19H-20H        19:54    20:19   100         2
5   19H-20H        19:55    22:01   100         1
6   20H-21H        20:04    22:06   100         1
7   20H-21H        20:11    22:04   100         2
8   20H-21H        20:15    22:04   100         1
9   20H-21H        20:16    22:04   100         1
10  20H-21H        20:17    22:08   100         1
11  20H-21H        20:19    22:09   100         1
12  20H-21H        20:20    22:01   100         2
13  20H-21H        20:22    22:35   100         1
14  20H-21H        20:28    22:34   100         2
15  20H-21H        20:29    22:22   100         1
16  20H-21H        20:30    22:14   100         1
17  20H-21H        20:31    22:10   100         1
18  20H-21H        20:37    22:31   100         1
19  20H-21H        20:41    22:29   100         2
20  20H-21H        20:43    22:34   100         1
21  20H-21H        20:46    22:39   100         1
22  20H-21H        20:47    22:35   100         1
23  20H-21H        20:52    22:34   100         1
24  20H-21H        20:52    23:09   100         1
25  20H-21H        20:53    21:59   100         1
26  20H-21H        20:54    21:49   100         1
27  20H-21H        20:56    22:10   100         1
28  20H-21H        20:56    22:55   100         1
29  21H-22H        21:03    22:51   100         1
30  21H-22H        21:05    23:12   100         1
31  21H-22H        21:08    22:59   100         1
32  21H-22H        21:10    23:27   100         1
33  21H-22H        21:10    23:30   100         1
34  21H-22H        21:15    23:23   100         1
35  21H-22H        21:19    21:56   100         1
36  21H-22H        21:21    22:48   100         1
37  21H-22H        21:25    23:26   100         1
38  21H-22H        21:25    23:32   100         1
39  21H-22H        21:27    22:55   100         1
40  21H-22H        21:27    23:32   100         1
41  21H-22H        21:33    23:11   100         1
42  21H-22H        21:37    23:04   100         1
43  21H-22H        21:39    00:05   100         1
44  21H-22H        21:40    23:08   100         1
45  21H-22H        21:45    23:04   100         1
46  21H-22H        21:49    00:06   100         1
47  21H-22H        21:54    00:07   100         1
48  21H-22H        21:58    00:02   100         1
49  21H-22H        21:58    00:24   100         1
50  22H-23H        22:10    00:19   100         1

Expected output should be:

   start_timeslot   needed_power_kW   
0   18H-19H         100        
1   19H-20H         700        
2   20H-21H         3100        
3   21H-22H         4600        
4   22H-23H         4600
5   23H-00H         1800
6   00H-01H         600

I am adding an example from the real dataset, because with the proposed solutions I cannot get the correct answer for all options.

mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count']
data = [['00H-01H', '00:05', '00:09', 200.0, 1],
       ['00H-01H', '00:35', '00:39', 200.0, 1],
       ['01H-02H', '01:05', '01:09', 200.0, 1],
       ['05H-06H', '05:34', '05:41', 200.0, 1],
       ['05H-06H', '05:54', '06:01', 200.0, 1],
       ['06H-07H', '06:20', '06:27', 200.0, 1],
       ['06H-07H', '06:44', '06:47', 200.0, 1],
       ['06H-07H', '06:59', '07:06', 200.0, 1],
       ['07H-08H', '07:18', '07:22', 200.0, 1],
       ['07H-08H', '07:36', '07:40', 200.0, 1],
       ['07H-08H', '07:49', '07:56', 200.0, 1],
       ['08H-09H', '08:01', '08:05', 200.0, 1],
       ['08H-09H', '08:08', '08:14', 200.0, 1],
       ['08H-09H', '08:14', '08:20', 200.0, 1],
       ['08H-09H', '08:21', '08:26', 200.0, 1],
       ['08H-09H', '08:28', '08:35', 200.0, 1],
       ['08H-09H', '08:35', '08:42', 200.0, 1],
       ['08H-09H', '08:42', '08:46', 200.0, 1],
       ['08H-09H', '08:49', '08:56', 200.0, 1],
       ['08H-09H', '08:55', '09:02', 200.0, 1],
       ['08H-09H', '09:00', '09:04', 200.0, 1],
       ['09H-10H', '09:07', '09:13', 200.0, 1],
       ['09H-10H', '09:13', '09:20', 200.0, 1],
       ['09H-10H', '09:25', '09:29', 200.0, 1],
       ['09H-10H', '09:31', '09:36', 200.0, 1],
       ['09H-10H', '09:37', '09:42', 200.0, 1],
       ['09H-10H', '09:43', '09:47', 200.0, 1],
       ['09H-10H', '09:49', '09:56', 200.0, 1],
       ['09H-10H', '09:55', '10:02', 200.0, 1],
       ['10H-11H', '10:01', '10:05', 200.0, 1],
       ['10H-11H', '10:06', '10:13', 200.0, 1],
       ['10H-11H', '10:12', '10:16', 200.0, 1],
       ['10H-11H', '10:18', '10:23', 200.0, 1],
       ['10H-11H', '10:23', '10:27', 200.0, 1],
       ['10H-11H', '10:29', '10:33', 200.0, 1],
       ['10H-11H', '10:35', '10:39', 200.0, 1],
       ['10H-11H', '10:41', '10:45', 200.0, 1],
       ['10H-11H', '10:47', '10:52', 200.0, 1],
       ['10H-11H', '10:53', '10:57', 200.0, 1],
       ['10H-11H', '10:59', '11:03', 200.0, 1],
       ['11H-12H', '11:05', '11:09', 200.0, 1],
       ['11H-12H', '11:11', '11:15', 200.0, 1],
       ['11H-12H', '11:17', '11:21', 200.0, 1],
       ['11H-12H', '11:23', '11:27', 200.0, 1],
       ['11H-12H', '11:29', '11:33', 200.0, 1],
       ['11H-12H', '11:35', '11:39', 200.0, 1],
       ['11H-12H', '11:41', '11:46', 200.0, 1],
       ['11H-12H', '11:47', '11:51', 200.0, 1],
       ['11H-12H', '11:53', '11:57', 200.0, 1],
       ['11H-12H', '11:59', '12:03', 200.0, 1],
       ['12H-13H', '12:05', '12:09', 200.0, 1],
       ['12H-13H', '12:11', '12:15', 200.0, 1],
       ['12H-13H', '12:17', '12:21', 200.0, 1],
       ['12H-13H', '12:22', '12:26', 200.0, 1],
       ['12H-13H', '12:28', '12:33', 200.0, 1],
       ['12H-13H', '12:33', '12:37', 200.0, 1],
       ['12H-13H', '12:38', '12:43', 200.0, 1],
       ['12H-13H', '12:43', '12:47', 200.0, 1],
       ['12H-13H', '12:49', '12:53', 200.0, 1],
       ['12H-13H', '12:55', '12:59', 200.0, 1],
       ['13H-14H', '13:01', '13:05', 200.0, 1],
       ['13H-14H', '13:07', '13:12', 200.0, 1],
       ['13H-14H', '13:13', '13:17', 200.0, 1],
       ['13H-14H', '13:19', '13:23', 200.0, 1],
       ['13H-14H', '13:25', '13:29', 200.0, 1],
       ['13H-14H', '13:31', '13:35', 200.0, 1],
       ['13H-14H', '13:37', '13:41', 200.0, 1],
       ['13H-14H', '13:43', '13:47', 200.0, 1],
       ['13H-14H', '13:49', '13:53', 200.0, 1],
       ['13H-14H', '13:55', '13:59', 200.0, 1],
       ['14H-15H', '14:01', '14:06', 200.0, 1],
       ['14H-15H', '14:07', '14:11', 200.0, 1],
       ['14H-15H', '14:13', '14:17', 200.0, 1],
       ['14H-15H', '14:19', '14:23', 200.0, 1],
       ['14H-15H', '14:25', '14:29', 200.0, 1],
       ['14H-15H', '14:31', '14:35', 200.0, 1],
       ['14H-15H', '14:37', '14:41', 200.0, 1],
       ['14H-15H', '14:42', '14:46', 200.0, 1],
       ['14H-15H', '14:48', '14:52', 200.0, 1],
       ['14H-15H', '14:54', '14:58', 200.0, 1],
       ['14H-15H', '14:59', '15:03', 200.0, 1],
       ['15H-16H', '15:04', '15:08', 200.0, 1],
       ['15H-16H', '15:09', '15:13', 200.0, 1],
       ['15H-16H', '15:15', '15:19', 200.0, 1],
       ['15H-16H', '15:21', '15:25', 200.0, 1],
       ['15H-16H', '15:27', '15:31', 200.0, 1],
       ['15H-16H', '15:33', '15:37', 200.0, 1],
       ['15H-16H', '15:39', '15:43', 200.0, 1],
       ['15H-16H', '15:45', '15:49', 200.0, 1],
       ['15H-16H', '15:51', '15:55', 200.0, 1],
       ['15H-16H', '15:57', '16:01', 200.0, 1],
       ['16H-17H', '16:03', '16:07', 200.0, 1],
       ['16H-17H', '16:09', '16:13', 200.0, 1],
       ['16H-17H', '16:15', '16:19', 200.0, 1],
       ['16H-17H', '16:21', '16:27', 200.0, 1],
       ['16H-17H', '16:27', '16:31', 200.0, 1],
       ['16H-17H', '16:33', '16:37', 200.0, 1],
       ['16H-17H', '16:39', '16:43', 200.0, 1],
       ['16H-17H', '16:45', '16:49', 200.0, 1],
       ['16H-17H', '16:51', '16:55', 200.0, 1],
       ['16H-17H', '16:57', '17:01', 200.0, 1],
       ['17H-18H', '17:04', '17:08', 200.0, 1],
       ['17H-18H', '17:10', '17:14', 200.0, 1],
       ['17H-18H', '17:16', '17:20', 200.0, 1],
       ['17H-18H', '17:22', '17:26', 200.0, 1],
       ['17H-18H', '17:28', '17:32', 200.0, 1],
       ['17H-18H', '17:34', '17:38', 200.0, 1],
       ['17H-18H', '17:40', '17:44', 200.0, 1],
       ['17H-18H', '17:46', '17:50', 200.0, 1],
       ['17H-18H', '17:51', '17:55', 200.0, 1],
       ['17H-18H', '17:57', '18:01', 200.0, 1],
       ['18H-19H', '18:03', '18:07', 200.0, 1],
       ['18H-19H', '18:09', '18:13', 200.0, 1],
       ['18H-19H', '18:15', '18:19', 200.0, 1],
       ['18H-19H', '18:21', '18:25', 200.0, 1],
       ['18H-19H', '18:27', '18:31', 200.0, 1],
       ['18H-19H', '18:33', '18:37', 200.0, 1],
       ['18H-19H', '18:39', '18:43', 200.0, 1],
       ['18H-19H', '18:45', '18:49', 200.0, 1],
       ['18H-19H', '18:51', '18:55', 200.0, 1],
       ['18H-19H', '18:57', '19:01', 200.0, 1],
       ['19H-20H', '19:04', '19:08', 200.0, 1],
       ['19H-20H', '19:11', '19:15', 200.0, 1],
       ['19H-20H', '19:18', '19:22', 200.0, 1],
       ['19H-20H', '19:25', '19:29', 200.0, 1],
       ['19H-20H', '19:32', '19:36', 200.0, 1],
       ['19H-20H', '19:39', '19:43', 200.0, 1],
       ['19H-20H', '19:46', '19:50', 200.0, 1],
       ['19H-20H', '19:52', '19:56', 200.0, 1],
       ['20H-21H', '20:07', '20:11', 200.0, 1],
       ['20H-21H', '20:13', '20:17', 200.0, 1],
       ['20H-21H', '20:20', '20:24', 200.0, 1],
       ['20H-21H', '20:28', '20:32', 200.0, 1],
       ['20H-21H', '20:35', '20:39', 200.0, 1],
       ['20H-21H', '20:42', '20:46', 200.0, 1],
       ['20H-21H', '20:49', '20:53', 200.0, 1],
       ['20H-21H', '20:55', '20:59', 200.0, 1],
       ['21H-22H', '21:10', '21:14', 200.0, 1],
       ['21H-22H', '21:23', '21:24', 200.0, 1],
       ['21H-22H', '21:39', '21:43', 200.0, 1],
       ['22H-23H', '22:08', '22:12', 200.0, 1],
       ['22H-23H', '22:38', '22:42', 200.0, 1],
       ['23H-00H', '23:08', '23:12', 200.0, 1],
       ['23H-00H', '23:36', '23:40', 200.0, 1]]

df = pd.DataFrame(data, columns=mycolumns)

730

asked Nov 15 '21 11:11

ctrl_z

Video Answer

3 Answers

There are two main steps to this solution. First, you need to create a new column that you can group over. Simply using start_timeslot won't work as rows in earlier timeslots that have not stopped charging can still affect later start_timeslot groups.

New groups can be created using the start and end values together with date_range. Note that extra care must be taken when the end value is in a new day.

df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
df.loc[df['start'] > df['end'], 'end'] = df['end'] + pd.DateOffset(1)

df['hours'] = df.apply(lambda row: pd.date_range(row['start'].floor('H'), row['end'].floor('H'), freq='H').hour, axis=1)

Intermediate result:

   start_timeslot               start                 end  power_kW  vehicle_count                                        hours
0         18H-19H 2021-11-18 18:00:00 2021-11-18 20:00:00       100              1      Int64Index([18, 19, 20], dtype='int64')
1         19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00       100              1      Int64Index([19, 20, 21], dtype='int64')
2         19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00       100              1      Int64Index([19, 20, 21], dtype='int64')
3         19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00       100              1      Int64Index([19, 20, 21], dtype='int64')
4         19H-20H 2021-11-18 19:00:00 2021-11-18 20:00:00       100              2          Int64Index([19, 20], dtype='int64')
5         19H-20H 2021-11-18 19:00:00 2021-11-18 22:00:00       100              1  Int64Index([19, 20, 21, 22], dtype='int64')

Now, we can explode the new column and then group by it.

The second step is to compute the maximum required power for a group. We can simply take the start and end times and sort them. We consider the start time as positive (we add the required power) and end times as negative (subtract the power). Next, we sort by time and then use cumsum on the series to get the cumulative sum and then take the maximum value of this. This will be the maximum required power for a group.

Code:

def func(df):
    df['power'] = df['power_kW'] * df['vehicle_count']
    pos = df.set_index('start')['power']
    neg = df.set_index('end')['power'] * -1
    df_merge = pd.concat([pos, neg]).reset_index()
    maximum = df_merge.sort_values(['index', 'power'], ascending=[True, False])['power'].cumsum().max()
    return maximum

res = df.explode('hours').groupby('hours').apply(func)

Note that if there is a car that stops charging at e.g. 10:35 and another one that starts at 10:35, this will be considered an overlap with the above code. If these should not overlap then change the sorting to use ascending=[True, True].

Result:

hours
0      600
18     100
19     700
20    3100
21    4700
22    4600
23    1800
dtype: int64

The start_timeslot column is gone so some final postprocessing to obtain the expected result is required:

res = res.to_frame('needed_power_kW').reset_index().rename(columns={'hours': 'start_timeslot'})
res['end_hour'] = (pd.to_datetime(res['start_timeslot'], unit='h') + pd.DateOffset(hours=1)).dt.hour
res['start_timeslot'] = res['start_timeslot'].astype(str).str.zfill(2) + 'H-' + res['end_hour'].astype(str).str.zfill(2) + 'H'
res = res.drop('end_hour', axis=1)

Final result:

  start_timeslot  needed_power_kW
0        00H-01H              600
1        18H-19H              100
2        19H-20H              700
3        20H-21H             3100
4        21H-22H             4600
5        22H-23H             4600
6        23H-00H             1800

Using the additional data:

  start_timeslot  needed_power_kW
0        00H-01H            200.0
1        01H-02H            200.0
2        05H-06H            200.0
3        06H-07H            200.0
4        07H-08H            200.0
5        08H-09H            400.0
6        09H-10H            400.0
7        10H-11H            400.0
8        11H-12H            200.0
9        12H-13H            400.0
10       13H-14H            200.0
11       14H-15H            200.0
12       15H-16H            200.0
13       16H-17H            400.0
14       17H-18H            200.0
15       18H-19H            200.0
16       19H-20H            200.0
17       20H-21H            200.0
18       21H-22H            200.0
19       22H-23H            200.0
20       23H-00H            200.0

Note: by changing the sorting as described above, the needed_power_kW for both 12H-13H and 16H-17H will be 200.

183

answered Nov 04 '22 20:11

Shaido

I would propose this approach. First, change all the times to minutes per day. (If you have a larger dataset over days, you could adapt the code below using datetime objects with the appropriate resolution, e.g. seconds.) That is done using the following function

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def toMinutes(timeString):
    t = [int(i) for i in timeString.split(':')]
    t = 60 * t[0] + t[1]
    return(t)

Next, I would iterate through your dataset noting the times where there were increases and decreases in usage along with the amount of power. You don't explicitly mention tracking the number of vehicles in your question but I am going to track that as well as power since they are pretty much the same from a counting point of view.

def add_row(timeString, power, nVehicle, arrive = True):
    t = toMinutes(timeString)
    delta = power * nVehicle
    if arrive:
        return([t / 60, delta, nVehicle])
    else:
        return([t  / 60, -delta, - nVehicle])

Running through your dataframe we can create a list of times where vehicles arrived or left and monitor the resulting change in power consumption. So each entry in your dataframe results in two entries in the list below, one when the vehicle arrives and one when it leaves. You stated in your question that a vehicle arriving at 20:50 and another leaving at 20:50 had to count as two vehicles at 20:50, so do all the arrivals first and then all the departures. That way when you sort below, it will count the arrivals before the departures.


x = [[0, 0, 0],] # initialize list (mostly for plotting, you can skip this if not plotting)

x += [add_row(t, P, n) \
     for t, P, n in zip(df['start'], df['power_kW'], df['vehicle_count'])]
     
x += [add_row(t, P, n, arrive = False) \
      for t, P, n in zip(df['end'], df['power_kW'], df['vehicle_count'])]

Now sort x by the time column so we have an ordered time series of discrete events that we care about. The power consumption and number of vehicles is just the sum of past events in our dataset.

x = np.array(sorted(x))
t = x[:,0]
P = np.cumsum(x[:,1])
N = np.cumsum(x[:,2])

I like plotting things so here is a script to make sure the code is doing what we want so far.

def plot_events(t, P, n):
    Nevent = len(P)
    to = []
    Po = []
    No = []
    for i in range(1, Nevent):
        to.append(t[i])
        to.append(t[i])
        Po.append(P[i-1])
        Po.append(P[i])
        No.append(N[i-1])
        No.append(N[i])

    fig, ax = plt.subplots(figsize = (6,8), nrows=2)
    ax[0].plot(to, Po)
    ax[0].set_xlabel('Time in Hours')
    ax[0].set_ylabel('Power in kW')

    ax[1].plot(to, No)
    ax[1].set_xlabel('Time in Hours')
    ax[1].set_ylabel('Number of vehicles charging')
    return(fig, ax)

fig, ax = plot_events(t, P, N)

events

Now to bin the events in time, we just look at the maximum demand in each time window of interest.

def bin_events(t, y, dt = 1, mint = 0, maxt = 24):
    # find max value of y in t bins of size dt
    x = np.arange(mint, maxt, dt)
    z = np.zeros(x.shape)
    for i in range(len(x)):
        try:
            z[i] = np.max(y[(t >= x[i]) & (t < x[i] + dt)])
        except ValueError:
            pass
    return(z)

maxP = bin_events(t, P)
maxN = bin_events(t, N)

And again, plot to make sure we're doing what we want:

ax[0].bar(np.arange(0,24) + 0.5, maxP, color = 'k', alpha = 0.3, width = 1)
ax[1].bar(np.arange(0,24) + 0.5, maxN, color = 'k', alpha = 0.3, width = 1)

events with max values

Finally, put the new info to a dataframe or add it to your existing one:

col = 'start_timeslot'
choices = [f"{i:02}H-{(i+1)%24:02}H" for i in range(24)]
conditions = [eval(f"df[col] == '{c}'") for c in choices]
summary_data_df = pd.DataFrame(np.array([choices, maxP, maxN]).transpose(),
                               columns=['start_timeslot', 'max_power_kW', 'max_vehicles'])    

df["max_power_kW"] = np.select(conditions, maxP, default=0)
df["max_vehicles"] = np.select(conditions, maxN, default=0)

   start_timeslot max_power_kW max_vehicles
0         00H-01H        200.0          1.0
1         01H-02H        200.0          1.0
2         02H-03H          0.0          0.0
3         03H-04H          0.0          0.0
4         04H-05H          0.0          0.0
5         05H-06H        200.0          1.0
6         06H-07H        200.0          1.0
7         07H-08H        200.0          1.0
8         08H-09H        400.0          2.0
9         09H-10H        400.0          2.0
10        10H-11H        400.0          2.0
11        11H-12H        200.0          1.0
12        12H-13H        200.0          1.0
13        13H-14H        200.0          1.0
14        14H-15H        200.0          1.0
15        15H-16H        200.0          1.0
16        16H-17H        200.0          1.0
17        17H-18H        200.0          1.0
18        18H-19H        200.0          1.0
19        19H-20H        200.0          1.0
20        20H-21H        200.0          1.0
21        21H-22H        200.0          1.0
22        22H-23H        200.0          1.0
23        23H-00H        200.0          1.0

answered Nov 04 '22 20:11

ramzeek

I have found an answer to your question. Code is below:

    import pandas as pd
mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count']                 #Column header
data = [['18H-19H', '18:26', '20:27', 100, 1], ['19H-20H', '19:06', '21:16', 100, 1],       #Data
        ['19H-20H', '19:19', '21:16', 100, 1], ['19H-20H', '19:36', '21:33', 100, 1], 
        ['19H-20H', '19:54', '20:19', 100, 2], ['19H-20H', '19:55', '22:01', 100, 1],
        ['20H-21H', '20:04', '22:06', 100, 1], ['20H-21H', '20:11', '22:04', 100, 2], 
        ['20H-21H', '20:15', '22:04', 100, 1], ['20H-21H', '20:16', '22:04', 100, 1], 
        ['20H-21H', '20:17', '22:08', 100, 1], ['20H-21H', '20:19', '22:09', 100, 1],
        ['20H-21H', '20:20', '22:01', 100, 2], ['20H-21H', '20:22', '22:35', 100, 1], 
        ['20H-21H', '20:28', '22:34', 100, 2], ['20H-21H', '20:29', '22:22', 100, 1], 
        ['20H-21H', '20:30', '22:14', 100, 1], ['20H-21H', '20:31', '22:10', 100, 1],
        ['20H-21H', '20:37', '22:31', 100, 1], ['20H-21H', '20:41', '22:29', 100, 2], 
        ['20H-21H', '20:43', '22:34', 100, 1], ['20H-21H', '20:46', '22:39', 100, 1], 
        ['20H-21H', '20:47', '22:35', 100, 1], ['20H-21H', '20:52', '22:34', 100, 1],
        ['20H-21H', '20:52', '23:09', 100, 1], ['20H-21H', '20:53', '21:59', 100, 1], 
        ['20H-21H', '20:54', '21:49', 100, 1], ['20H-21H', '20:56', '22:10', 100, 1], 
        ['20H-21H', '20:56', '22:55', 100, 1], ['21H-22H', '21:03', '22:51', 100, 1],
        ['21H-22H', '21:05', '23:12', 100, 1], ['21H-22H', '21:08', '22:59', 100, 1], 
        ['21H-22H', '21:10', '23:27', 100, 1], ['21H-22H', '21:10', '23:30', 100, 1], 
        ['21H-22H', '21:15', '23:23', 100, 1], ['21H-22H', '21:19', '21:56', 100, 1],
        ['21H-22H', '21:21', '22:48', 100, 1], ['21H-22H', '21:25', '23:26', 100, 1], 
        ['21H-22H', '21:25', '23:32', 100, 1], ['21H-22H', '21:27', '22:55', 100, 1], 
        ['21H-22H', '21:27', '23:32', 100, 1], ['21H-22H', '21:33', '23:11', 100, 1],
        ['21H-22H', '21:37', '23:04', 100, 1], ['21H-22H', '21:39', '00:05', 100, 1], 
        ['21H-22H', '21:40', '23:08', 100, 1], ['21H-22H', '21:45', '23:04', 100, 1], 
        ['21H-22H', '21:49', '00:06', 100, 1], ['21H-22H', '21:54', '00:07', 100, 1],
        ['21H-22H', '21:58', '00:02', 100, 1], ['21H-22H', '21:58', '00:24', 100, 1], 
        ['22H-23H', '22:10', '00:19', 100, 1]]
        

per_hour_columns = ['hour', 'power_kW']         #Header for individual vehicles
per_hour = []       #Empty data

x = 0
while(x < len(data)):       #Loop through all data
    y = int(data[x][1].split(':')[0])
    while(y != int(data[x][2].split(':')[0])+1):      #Loop though all hours the vehicle stayed
        if(y >= 24):
            y = 0
        i = 0
        while(i < data[x][4]):          #Loop through number of vehicles
            per_hour.append([str(y), data[x][3]])       #Append vehicle to array
            i+=1
        y+=1
    x+=1

per_hour_df = pd.DataFrame(per_hour, columns=per_hour_columns)      #Turn to dataframe


total_hours_header = ['time_slot', 'total_power_kW']        #header for total power per time slot
total_hours = []

i = 0
while(i < 24):      #Loop through all hours
    total = int(per_hour_df[per_hour_df['hour'] == str(i)].sum()[1])        #Get total for timeslot
    total_hours.append([str(i) + "H-" + str(i+1) + "H", total])             #Append time slot to array
    i+=1

total_hours_df = pd.DataFrame(total_hours, columns=total_hours_header)      #Turn to dataframe

print(total_hours_df)       #Print

I hope this works for you.

Output:

          time_slot  total_power_kW
0      0H-1H             600
1      1H-2H               0
2      2H-3H               0
3      3H-4H               0
4      4H-5H               0
5      5H-6H               0
6      6H-7H               0
7      7H-8H               0
8      8H-9H               0
9     9H-10H               0
10   10H-11H               0
11   11H-12H               0
12   12H-13H               0
13   13H-14H               0
14   14H-15H               0
15   15H-16H               0
16   16H-17H               0
17   17H-18H               0
18   18H-19H             100
19   19H-20H             700
20   20H-21H            3400
21   21H-22H            5200
22   22H-23H            4700
23   23H-24H            1800

To explain: I first create a new array which has one slot per vehicle (instead of having a vehicle count greater than 1). I also duplicate each vehicle for every hour is stayed. Next I looped through all hours (0-24) and appended the sum of all the elements with the current hour I am trying to find to a new array. This gives you an array which contains all 24 time slots with their corresponding power usage.

EDIT: I have changed the code slightly so that it records times that pass from hour 23 to 00

answered Nov 04 '22 21:11

SolveItPlanet

Related questions
                            
                                Read .pptx file from s3
                            
                                Matplotlib figure '.supxlabel' does not work
                            
                                Unable to access the updated global variable's value
                            
                                How to get the pivot lines from two tab-separated files?
                            
                                Update XML with an SQL query
                            
                                Most efficient way to find neighbors of neighbors in python
                            
                                turn "string-like" list into int with python [duplicate]
                            
                                AttributeError: module 'keras.utils.generic_utils' has no attribute 'populate_dict_with_module_objects'
                            
                                Run only tests which depend on the change
                            
                                Pandas - Explode multiple columns in pandas and assign value based on the exploded column
                            
                                Spacy: How to get all words that describe a noun?
                            
                                Why is np.size("") 1?
                            
                                How to type hint a Callable of a function with default arguments?
                            
                                Split a multifasta file to files with the same number of accesion numbers
                            
                                Implementing python decorators in a toy example
                            
                                Use of Generic and TypeVar
                            
                                Strange behaviour when mixing abstractmethod, classmethod and property decorators
                            
                                How to avoid to start hundreds of threads when starting (very short) actions at different timings in the future
                            
                                How to randomly set inputs to zero in keras during training autoencoder (callback)?
                            
                                Find out how similar a set is compared to all other sets in a collection of sets

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With