Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the occurence based on overlapping intervals (HH:MM) in Pandas

Tags:

python

pandas

I have a dataset with the exact time and quantity of e-vehicles charging at a charging station and its charging power in kW.

What I am trying to do is to calculate how many e-vehicles are charging at the exact same time (taking into account start and end times), and according to that, calculate the required charging power (sum up the power_kW) needed for each hour. For example, from 18-19h I need only 100kW of power.

Below I shared a part of the data, I obviously have the data for all the 24 hours, and for each hour I should have a total needed charging station power in kW. Obviously, if one vehicle finishes charging at 20:50, and another one starts at 20:50 - they overlap and I need to count 2*power_kW. Also, in some slots, I have more than 1 vehicle starting and finishing at the same time.

I find it hard to do it fast and automatically with the code, given that the times are in hour:minute format, so I seek help here. I checked similar posts and couldn't find anything to bring me to a solution on my own.

Does someone have an idea how to do this efficently and correctly?

Input data:

import pandas as pd
mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count']
data = [['18H-19H', '18:26', '20:27', 100, 1], ['19H-20H', '19:06', '21:16', 100, 1], 
        ['19H-20H', '19:19', '21:16', 100, 1], ['19H-20H', '19:36', '21:33', 100, 1], 
        ['19H-20H', '19:54', '20:19', 100, 2], ['19H-20H', '19:55', '22:01', 100, 1],
        ['20H-21H', '20:04', '22:06', 100, 1], ['20H-21H', '20:11', '22:04', 100, 2], 
        ['20H-21H', '20:15', '22:04', 100, 1], ['20H-21H', '20:16', '22:04', 100, 1], 
        ['20H-21H', '20:17', '22:08', 100, 1], ['20H-21H', '20:19', '22:09', 100, 1],
        ['20H-21H', '20:20', '22:01', 100, 2], ['20H-21H', '20:22', '22:35', 100, 1], 
        ['20H-21H', '20:28', '22:34', 100, 2], ['20H-21H', '20:29', '22:22', 100, 1], 
        ['20H-21H', '20:30', '22:14', 100, 1], ['20H-21H', '20:31', '22:10', 100, 1],
        ['20H-21H', '20:37', '22:31', 100, 1], ['20H-21H', '20:41', '22:29', 100, 2], 
        ['20H-21H', '20:43', '22:34', 100, 1], ['20H-21H', '20:46', '22:39', 100, 1], 
        ['20H-21H', '20:47', '22:35', 100, 1], ['20H-21H', '20:52', '22:34', 100, 1],
        ['20H-21H', '20:52', '23:09', 100, 1], ['20H-21H', '20:53', '21:59', 100, 1], 
        ['20H-21H', '20:54', '21:49', 100, 1], ['20H-21H', '20:56', '22:10', 100, 1], 
        ['20H-21H', '20:56', '22:55', 100, 1], ['21H-22H', '21:03', '22:51', 100, 1],
        ['21H-22H', '21:05', '23:12', 100, 1], ['21H-22H', '21:08', '22:59', 100, 1], 
        ['21H-22H', '21:10', '23:27', 100, 1], ['21H-22H', '21:10', '23:30', 100, 1], 
        ['21H-22H', '21:15', '23:23', 100, 1], ['21H-22H', '21:19', '21:56', 100, 1],
        ['21H-22H', '21:21', '22:48', 100, 1], ['21H-22H', '21:25', '23:26', 100, 1], 
        ['21H-22H', '21:25', '23:32', 100, 1], ['21H-22H', '21:27', '22:55', 100, 1], 
        ['21H-22H', '21:27', '23:32', 100, 1], ['21H-22H', '21:33', '23:11', 100, 1],
        ['21H-22H', '21:37', '23:04', 100, 1], ['21H-22H', '21:39', '00:05', 100, 1], 
        ['21H-22H', '21:40', '23:08', 100, 1], ['21H-22H', '21:45', '23:04', 100, 1], 
        ['21H-22H', '21:49', '00:06', 100, 1], ['21H-22H', '21:54', '00:07', 100, 1],
        ['21H-22H', '21:58', '00:02', 100, 1], ['21H-22H', '21:58', '00:24', 100, 1], 
        ['22H-23H', '22:10', '00:19', 100, 1]]
        
df = pd.DataFrame(data, columns=mycolumns)

df
   start_timeslot   start   end     power_kW   vehicle_count
0   18H-19H        18:26    20:27   100         1
1   19H-20H        19:06    21:16   100         1
2   19H-20H        19:19    21:16   100         1
3   19H-20H        19:36    21:33   100         1
4   19H-20H        19:54    20:19   100         2
5   19H-20H        19:55    22:01   100         1
6   20H-21H        20:04    22:06   100         1
7   20H-21H        20:11    22:04   100         2
8   20H-21H        20:15    22:04   100         1
9   20H-21H        20:16    22:04   100         1
10  20H-21H        20:17    22:08   100         1
11  20H-21H        20:19    22:09   100         1
12  20H-21H        20:20    22:01   100         2
13  20H-21H        20:22    22:35   100         1
14  20H-21H        20:28    22:34   100         2
15  20H-21H        20:29    22:22   100         1
16  20H-21H        20:30    22:14   100         1
17  20H-21H        20:31    22:10   100         1
18  20H-21H        20:37    22:31   100         1
19  20H-21H        20:41    22:29   100         2
20  20H-21H        20:43    22:34   100         1
21  20H-21H        20:46    22:39   100         1
22  20H-21H        20:47    22:35   100         1
23  20H-21H        20:52    22:34   100         1
24  20H-21H        20:52    23:09   100         1
25  20H-21H        20:53    21:59   100         1
26  20H-21H        20:54    21:49   100         1
27  20H-21H        20:56    22:10   100         1
28  20H-21H        20:56    22:55   100         1
29  21H-22H        21:03    22:51   100         1
30  21H-22H        21:05    23:12   100         1
31  21H-22H        21:08    22:59   100         1
32  21H-22H        21:10    23:27   100         1
33  21H-22H        21:10    23:30   100         1
34  21H-22H        21:15    23:23   100         1
35  21H-22H        21:19    21:56   100         1
36  21H-22H        21:21    22:48   100         1
37  21H-22H        21:25    23:26   100         1
38  21H-22H        21:25    23:32   100         1
39  21H-22H        21:27    22:55   100         1
40  21H-22H        21:27    23:32   100         1
41  21H-22H        21:33    23:11   100         1
42  21H-22H        21:37    23:04   100         1
43  21H-22H        21:39    00:05   100         1
44  21H-22H        21:40    23:08   100         1
45  21H-22H        21:45    23:04   100         1
46  21H-22H        21:49    00:06   100         1
47  21H-22H        21:54    00:07   100         1
48  21H-22H        21:58    00:02   100         1
49  21H-22H        21:58    00:24   100         1
50  22H-23H        22:10    00:19   100         1

Expected output should be:

   start_timeslot   needed_power_kW   
0   18H-19H         100        
1   19H-20H         700        
2   20H-21H         3100        
3   21H-22H         4600        
4   22H-23H         4600
5   23H-00H         1800
6   00H-01H         600

I am adding an example from the real dataset, because with the proposed solutions I cannot get the correct answer for all options.

mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count']
data = [['00H-01H', '00:05', '00:09', 200.0, 1],
       ['00H-01H', '00:35', '00:39', 200.0, 1],
       ['01H-02H', '01:05', '01:09', 200.0, 1],
       ['05H-06H', '05:34', '05:41', 200.0, 1],
       ['05H-06H', '05:54', '06:01', 200.0, 1],
       ['06H-07H', '06:20', '06:27', 200.0, 1],
       ['06H-07H', '06:44', '06:47', 200.0, 1],
       ['06H-07H', '06:59', '07:06', 200.0, 1],
       ['07H-08H', '07:18', '07:22', 200.0, 1],
       ['07H-08H', '07:36', '07:40', 200.0, 1],
       ['07H-08H', '07:49', '07:56', 200.0, 1],
       ['08H-09H', '08:01', '08:05', 200.0, 1],
       ['08H-09H', '08:08', '08:14', 200.0, 1],
       ['08H-09H', '08:14', '08:20', 200.0, 1],
       ['08H-09H', '08:21', '08:26', 200.0, 1],
       ['08H-09H', '08:28', '08:35', 200.0, 1],
       ['08H-09H', '08:35', '08:42', 200.0, 1],
       ['08H-09H', '08:42', '08:46', 200.0, 1],
       ['08H-09H', '08:49', '08:56', 200.0, 1],
       ['08H-09H', '08:55', '09:02', 200.0, 1],
       ['08H-09H', '09:00', '09:04', 200.0, 1],
       ['09H-10H', '09:07', '09:13', 200.0, 1],
       ['09H-10H', '09:13', '09:20', 200.0, 1],
       ['09H-10H', '09:25', '09:29', 200.0, 1],
       ['09H-10H', '09:31', '09:36', 200.0, 1],
       ['09H-10H', '09:37', '09:42', 200.0, 1],
       ['09H-10H', '09:43', '09:47', 200.0, 1],
       ['09H-10H', '09:49', '09:56', 200.0, 1],
       ['09H-10H', '09:55', '10:02', 200.0, 1],
       ['10H-11H', '10:01', '10:05', 200.0, 1],
       ['10H-11H', '10:06', '10:13', 200.0, 1],
       ['10H-11H', '10:12', '10:16', 200.0, 1],
       ['10H-11H', '10:18', '10:23', 200.0, 1],
       ['10H-11H', '10:23', '10:27', 200.0, 1],
       ['10H-11H', '10:29', '10:33', 200.0, 1],
       ['10H-11H', '10:35', '10:39', 200.0, 1],
       ['10H-11H', '10:41', '10:45', 200.0, 1],
       ['10H-11H', '10:47', '10:52', 200.0, 1],
       ['10H-11H', '10:53', '10:57', 200.0, 1],
       ['10H-11H', '10:59', '11:03', 200.0, 1],
       ['11H-12H', '11:05', '11:09', 200.0, 1],
       ['11H-12H', '11:11', '11:15', 200.0, 1],
       ['11H-12H', '11:17', '11:21', 200.0, 1],
       ['11H-12H', '11:23', '11:27', 200.0, 1],
       ['11H-12H', '11:29', '11:33', 200.0, 1],
       ['11H-12H', '11:35', '11:39', 200.0, 1],
       ['11H-12H', '11:41', '11:46', 200.0, 1],
       ['11H-12H', '11:47', '11:51', 200.0, 1],
       ['11H-12H', '11:53', '11:57', 200.0, 1],
       ['11H-12H', '11:59', '12:03', 200.0, 1],
       ['12H-13H', '12:05', '12:09', 200.0, 1],
       ['12H-13H', '12:11', '12:15', 200.0, 1],
       ['12H-13H', '12:17', '12:21', 200.0, 1],
       ['12H-13H', '12:22', '12:26', 200.0, 1],
       ['12H-13H', '12:28', '12:33', 200.0, 1],
       ['12H-13H', '12:33', '12:37', 200.0, 1],
       ['12H-13H', '12:38', '12:43', 200.0, 1],
       ['12H-13H', '12:43', '12:47', 200.0, 1],
       ['12H-13H', '12:49', '12:53', 200.0, 1],
       ['12H-13H', '12:55', '12:59', 200.0, 1],
       ['13H-14H', '13:01', '13:05', 200.0, 1],
       ['13H-14H', '13:07', '13:12', 200.0, 1],
       ['13H-14H', '13:13', '13:17', 200.0, 1],
       ['13H-14H', '13:19', '13:23', 200.0, 1],
       ['13H-14H', '13:25', '13:29', 200.0, 1],
       ['13H-14H', '13:31', '13:35', 200.0, 1],
       ['13H-14H', '13:37', '13:41', 200.0, 1],
       ['13H-14H', '13:43', '13:47', 200.0, 1],
       ['13H-14H', '13:49', '13:53', 200.0, 1],
       ['13H-14H', '13:55', '13:59', 200.0, 1],
       ['14H-15H', '14:01', '14:06', 200.0, 1],
       ['14H-15H', '14:07', '14:11', 200.0, 1],
       ['14H-15H', '14:13', '14:17', 200.0, 1],
       ['14H-15H', '14:19', '14:23', 200.0, 1],
       ['14H-15H', '14:25', '14:29', 200.0, 1],
       ['14H-15H', '14:31', '14:35', 200.0, 1],
       ['14H-15H', '14:37', '14:41', 200.0, 1],
       ['14H-15H', '14:42', '14:46', 200.0, 1],
       ['14H-15H', '14:48', '14:52', 200.0, 1],
       ['14H-15H', '14:54', '14:58', 200.0, 1],
       ['14H-15H', '14:59', '15:03', 200.0, 1],
       ['15H-16H', '15:04', '15:08', 200.0, 1],
       ['15H-16H', '15:09', '15:13', 200.0, 1],
       ['15H-16H', '15:15', '15:19', 200.0, 1],
       ['15H-16H', '15:21', '15:25', 200.0, 1],
       ['15H-16H', '15:27', '15:31', 200.0, 1],
       ['15H-16H', '15:33', '15:37', 200.0, 1],
       ['15H-16H', '15:39', '15:43', 200.0, 1],
       ['15H-16H', '15:45', '15:49', 200.0, 1],
       ['15H-16H', '15:51', '15:55', 200.0, 1],
       ['15H-16H', '15:57', '16:01', 200.0, 1],
       ['16H-17H', '16:03', '16:07', 200.0, 1],
       ['16H-17H', '16:09', '16:13', 200.0, 1],
       ['16H-17H', '16:15', '16:19', 200.0, 1],
       ['16H-17H', '16:21', '16:27', 200.0, 1],
       ['16H-17H', '16:27', '16:31', 200.0, 1],
       ['16H-17H', '16:33', '16:37', 200.0, 1],
       ['16H-17H', '16:39', '16:43', 200.0, 1],
       ['16H-17H', '16:45', '16:49', 200.0, 1],
       ['16H-17H', '16:51', '16:55', 200.0, 1],
       ['16H-17H', '16:57', '17:01', 200.0, 1],
       ['17H-18H', '17:04', '17:08', 200.0, 1],
       ['17H-18H', '17:10', '17:14', 200.0, 1],
       ['17H-18H', '17:16', '17:20', 200.0, 1],
       ['17H-18H', '17:22', '17:26', 200.0, 1],
       ['17H-18H', '17:28', '17:32', 200.0, 1],
       ['17H-18H', '17:34', '17:38', 200.0, 1],
       ['17H-18H', '17:40', '17:44', 200.0, 1],
       ['17H-18H', '17:46', '17:50', 200.0, 1],
       ['17H-18H', '17:51', '17:55', 200.0, 1],
       ['17H-18H', '17:57', '18:01', 200.0, 1],
       ['18H-19H', '18:03', '18:07', 200.0, 1],
       ['18H-19H', '18:09', '18:13', 200.0, 1],
       ['18H-19H', '18:15', '18:19', 200.0, 1],
       ['18H-19H', '18:21', '18:25', 200.0, 1],
       ['18H-19H', '18:27', '18:31', 200.0, 1],
       ['18H-19H', '18:33', '18:37', 200.0, 1],
       ['18H-19H', '18:39', '18:43', 200.0, 1],
       ['18H-19H', '18:45', '18:49', 200.0, 1],
       ['18H-19H', '18:51', '18:55', 200.0, 1],
       ['18H-19H', '18:57', '19:01', 200.0, 1],
       ['19H-20H', '19:04', '19:08', 200.0, 1],
       ['19H-20H', '19:11', '19:15', 200.0, 1],
       ['19H-20H', '19:18', '19:22', 200.0, 1],
       ['19H-20H', '19:25', '19:29', 200.0, 1],
       ['19H-20H', '19:32', '19:36', 200.0, 1],
       ['19H-20H', '19:39', '19:43', 200.0, 1],
       ['19H-20H', '19:46', '19:50', 200.0, 1],
       ['19H-20H', '19:52', '19:56', 200.0, 1],
       ['20H-21H', '20:07', '20:11', 200.0, 1],
       ['20H-21H', '20:13', '20:17', 200.0, 1],
       ['20H-21H', '20:20', '20:24', 200.0, 1],
       ['20H-21H', '20:28', '20:32', 200.0, 1],
       ['20H-21H', '20:35', '20:39', 200.0, 1],
       ['20H-21H', '20:42', '20:46', 200.0, 1],
       ['20H-21H', '20:49', '20:53', 200.0, 1],
       ['20H-21H', '20:55', '20:59', 200.0, 1],
       ['21H-22H', '21:10', '21:14', 200.0, 1],
       ['21H-22H', '21:23', '21:24', 200.0, 1],
       ['21H-22H', '21:39', '21:43', 200.0, 1],
       ['22H-23H', '22:08', '22:12', 200.0, 1],
       ['22H-23H', '22:38', '22:42', 200.0, 1],
       ['23H-00H', '23:08', '23:12', 200.0, 1],
       ['23H-00H', '23:36', '23:40', 200.0, 1]]

df = pd.DataFrame(data, columns=mycolumns)
like image 730
ctrl_z Avatar asked Nov 15 '21 11:11

ctrl_z


People also ask

How do I count occurrences in a specific column in pandas?

Pandas GroupBy – Count occurrences in column. Using the size () or count () method with pandas.DataFrame.groupby () will generate the count of a number of occurrences of data present in a particular column of the dataframe. However, this operation can also be performed using pandas.Series.value_counts () and, pandas.Index.value_counts ().

How many times does a string occur in a pandas Dataframe?

The following code shows how to count the number of occurrences of a specific string in a column of a pandas DataFrame: From the output we can see that the string ‘B’ occurs 4 times in the ‘team’ column. Note that we can also use the following syntax to find how frequently each unique value occurs in the ‘team’ column:

How to quickly check if intervals overlap or not?

Note that if intervals are sorted by decreasing order of start times, we can quickly check if intervals overlap or not by comparing the start time of the previous interval with the end time of the current interval. Below is the implementation of the above algorithm.

How to separate count and groupby occurrences in a CSV file?

Here, we separate count occurrences and combined count occurrences of the categorical columns present in a CSV file. The basic approach to use this method is to assign the column names as parameters in the groupby () method and then using the count () with it.


Video Answer


3 Answers

There are two main steps to this solution. First, you need to create a new column that you can group over. Simply using start_timeslot won't work as rows in earlier timeslots that have not stopped charging can still affect later start_timeslot groups.

New groups can be created using the start and end values together with date_range. Note that extra care must be taken when the end value is in a new day.

df['start'] = pd.to_datetime(df['start'])
df['end'] = pd.to_datetime(df['end'])
df.loc[df['start'] > df['end'], 'end'] = df['end'] + pd.DateOffset(1)

df['hours'] = df.apply(lambda row: pd.date_range(row['start'].floor('H'), row['end'].floor('H'), freq='H').hour, axis=1)

Intermediate result:

   start_timeslot               start                 end  power_kW  vehicle_count                                        hours
0         18H-19H 2021-11-18 18:00:00 2021-11-18 20:00:00       100              1      Int64Index([18, 19, 20], dtype='int64')
1         19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00       100              1      Int64Index([19, 20, 21], dtype='int64')
2         19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00       100              1      Int64Index([19, 20, 21], dtype='int64')
3         19H-20H 2021-11-18 19:00:00 2021-11-18 21:00:00       100              1      Int64Index([19, 20, 21], dtype='int64')
4         19H-20H 2021-11-18 19:00:00 2021-11-18 20:00:00       100              2          Int64Index([19, 20], dtype='int64')
5         19H-20H 2021-11-18 19:00:00 2021-11-18 22:00:00       100              1  Int64Index([19, 20, 21, 22], dtype='int64')

Now, we can explode the new column and then group by it.

The second step is to compute the maximum required power for a group. We can simply take the start and end times and sort them. We consider the start time as positive (we add the required power) and end times as negative (subtract the power). Next, we sort by time and then use cumsum on the series to get the cumulative sum and then take the maximum value of this. This will be the maximum required power for a group.

Code:

def func(df):
    df['power'] = df['power_kW'] * df['vehicle_count']
    pos = df.set_index('start')['power']
    neg = df.set_index('end')['power'] * -1
    df_merge = pd.concat([pos, neg]).reset_index()
    maximum = df_merge.sort_values(['index', 'power'], ascending=[True, False])['power'].cumsum().max()
    return maximum

res = df.explode('hours').groupby('hours').apply(func)

Note that if there is a car that stops charging at e.g. 10:35 and another one that starts at 10:35, this will be considered an overlap with the above code. If these should not overlap then change the sorting to use ascending=[True, True].

Result:

hours
0      600
18     100
19     700
20    3100
21    4700
22    4600
23    1800
dtype: int64

The start_timeslot column is gone so some final postprocessing to obtain the expected result is required:

res = res.to_frame('needed_power_kW').reset_index().rename(columns={'hours': 'start_timeslot'})
res['end_hour'] = (pd.to_datetime(res['start_timeslot'], unit='h') + pd.DateOffset(hours=1)).dt.hour
res['start_timeslot'] = res['start_timeslot'].astype(str).str.zfill(2) + 'H-' + res['end_hour'].astype(str).str.zfill(2) + 'H'
res = res.drop('end_hour', axis=1)

Final result:

  start_timeslot  needed_power_kW
0        00H-01H              600
1        18H-19H              100
2        19H-20H              700
3        20H-21H             3100
4        21H-22H             4600
5        22H-23H             4600
6        23H-00H             1800

Using the additional data:

  start_timeslot  needed_power_kW
0        00H-01H            200.0
1        01H-02H            200.0
2        05H-06H            200.0
3        06H-07H            200.0
4        07H-08H            200.0
5        08H-09H            400.0
6        09H-10H            400.0
7        10H-11H            400.0
8        11H-12H            200.0
9        12H-13H            400.0
10       13H-14H            200.0
11       14H-15H            200.0
12       15H-16H            200.0
13       16H-17H            400.0
14       17H-18H            200.0
15       18H-19H            200.0
16       19H-20H            200.0
17       20H-21H            200.0
18       21H-22H            200.0
19       22H-23H            200.0
20       23H-00H            200.0

Note: by changing the sorting as described above, the needed_power_kW for both 12H-13H and 16H-17H will be 200.

like image 183
Shaido Avatar answered Nov 04 '22 20:11

Shaido


I would propose this approach. First, change all the times to minutes per day. (If you have a larger dataset over days, you could adapt the code below using datetime objects with the appropriate resolution, e.g. seconds.) That is done using the following function

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def toMinutes(timeString):
    t = [int(i) for i in timeString.split(':')]
    t = 60 * t[0] + t[1]
    return(t)

Next, I would iterate through your dataset noting the times where there were increases and decreases in usage along with the amount of power. You don't explicitly mention tracking the number of vehicles in your question but I am going to track that as well as power since they are pretty much the same from a counting point of view.

def add_row(timeString, power, nVehicle, arrive = True):
    t = toMinutes(timeString)
    delta = power * nVehicle
    if arrive:
        return([t / 60, delta, nVehicle])
    else:
        return([t  / 60, -delta, - nVehicle])

Running through your dataframe we can create a list of times where vehicles arrived or left and monitor the resulting change in power consumption. So each entry in your dataframe results in two entries in the list below, one when the vehicle arrives and one when it leaves. You stated in your question that a vehicle arriving at 20:50 and another leaving at 20:50 had to count as two vehicles at 20:50, so do all the arrivals first and then all the departures. That way when you sort below, it will count the arrivals before the departures.


x = [[0, 0, 0],] # initialize list (mostly for plotting, you can skip this if not plotting)

x += [add_row(t, P, n) \
     for t, P, n in zip(df['start'], df['power_kW'], df['vehicle_count'])]
     
x += [add_row(t, P, n, arrive = False) \
      for t, P, n in zip(df['end'], df['power_kW'], df['vehicle_count'])]

Now sort x by the time column so we have an ordered time series of discrete events that we care about. The power consumption and number of vehicles is just the sum of past events in our dataset.

x = np.array(sorted(x))
t = x[:,0]
P = np.cumsum(x[:,1])
N = np.cumsum(x[:,2])

I like plotting things so here is a script to make sure the code is doing what we want so far.

def plot_events(t, P, n):
    Nevent = len(P)
    to = []
    Po = []
    No = []
    for i in range(1, Nevent):
        to.append(t[i])
        to.append(t[i])
        Po.append(P[i-1])
        Po.append(P[i])
        No.append(N[i-1])
        No.append(N[i])

    fig, ax = plt.subplots(figsize = (6,8), nrows=2)
    ax[0].plot(to, Po)
    ax[0].set_xlabel('Time in Hours')
    ax[0].set_ylabel('Power in kW')

    ax[1].plot(to, No)
    ax[1].set_xlabel('Time in Hours')
    ax[1].set_ylabel('Number of vehicles charging')
    return(fig, ax)

fig, ax = plot_events(t, P, N)

events

Now to bin the events in time, we just look at the maximum demand in each time window of interest.

def bin_events(t, y, dt = 1, mint = 0, maxt = 24):
    # find max value of y in t bins of size dt
    x = np.arange(mint, maxt, dt)
    z = np.zeros(x.shape)
    for i in range(len(x)):
        try:
            z[i] = np.max(y[(t >= x[i]) & (t < x[i] + dt)])
        except ValueError:
            pass
    return(z)

maxP = bin_events(t, P)
maxN = bin_events(t, N)

And again, plot to make sure we're doing what we want:

ax[0].bar(np.arange(0,24) + 0.5, maxP, color = 'k', alpha = 0.3, width = 1)
ax[1].bar(np.arange(0,24) + 0.5, maxN, color = 'k', alpha = 0.3, width = 1)

events with max values

Finally, put the new info to a dataframe or add it to your existing one:

col = 'start_timeslot'
choices = [f"{i:02}H-{(i+1)%24:02}H" for i in range(24)]
conditions = [eval(f"df[col] == '{c}'") for c in choices]
summary_data_df = pd.DataFrame(np.array([choices, maxP, maxN]).transpose(),
                               columns=['start_timeslot', 'max_power_kW', 'max_vehicles'])    

df["max_power_kW"] = np.select(conditions, maxP, default=0)
df["max_vehicles"] = np.select(conditions, maxN, default=0)
   start_timeslot max_power_kW max_vehicles
0         00H-01H        200.0          1.0
1         01H-02H        200.0          1.0
2         02H-03H          0.0          0.0
3         03H-04H          0.0          0.0
4         04H-05H          0.0          0.0
5         05H-06H        200.0          1.0
6         06H-07H        200.0          1.0
7         07H-08H        200.0          1.0
8         08H-09H        400.0          2.0
9         09H-10H        400.0          2.0
10        10H-11H        400.0          2.0
11        11H-12H        200.0          1.0
12        12H-13H        200.0          1.0
13        13H-14H        200.0          1.0
14        14H-15H        200.0          1.0
15        15H-16H        200.0          1.0
16        16H-17H        200.0          1.0
17        17H-18H        200.0          1.0
18        18H-19H        200.0          1.0
19        19H-20H        200.0          1.0
20        20H-21H        200.0          1.0
21        21H-22H        200.0          1.0
22        22H-23H        200.0          1.0
23        23H-00H        200.0          1.0
like image 36
ramzeek Avatar answered Nov 04 '22 20:11

ramzeek


I have found an answer to your question. Code is below:

    import pandas as pd
mycolumns = ['start_timeslot', 'start', 'end', 'power_kW', 'vehicle_count']                 #Column header
data = [['18H-19H', '18:26', '20:27', 100, 1], ['19H-20H', '19:06', '21:16', 100, 1],       #Data
        ['19H-20H', '19:19', '21:16', 100, 1], ['19H-20H', '19:36', '21:33', 100, 1], 
        ['19H-20H', '19:54', '20:19', 100, 2], ['19H-20H', '19:55', '22:01', 100, 1],
        ['20H-21H', '20:04', '22:06', 100, 1], ['20H-21H', '20:11', '22:04', 100, 2], 
        ['20H-21H', '20:15', '22:04', 100, 1], ['20H-21H', '20:16', '22:04', 100, 1], 
        ['20H-21H', '20:17', '22:08', 100, 1], ['20H-21H', '20:19', '22:09', 100, 1],
        ['20H-21H', '20:20', '22:01', 100, 2], ['20H-21H', '20:22', '22:35', 100, 1], 
        ['20H-21H', '20:28', '22:34', 100, 2], ['20H-21H', '20:29', '22:22', 100, 1], 
        ['20H-21H', '20:30', '22:14', 100, 1], ['20H-21H', '20:31', '22:10', 100, 1],
        ['20H-21H', '20:37', '22:31', 100, 1], ['20H-21H', '20:41', '22:29', 100, 2], 
        ['20H-21H', '20:43', '22:34', 100, 1], ['20H-21H', '20:46', '22:39', 100, 1], 
        ['20H-21H', '20:47', '22:35', 100, 1], ['20H-21H', '20:52', '22:34', 100, 1],
        ['20H-21H', '20:52', '23:09', 100, 1], ['20H-21H', '20:53', '21:59', 100, 1], 
        ['20H-21H', '20:54', '21:49', 100, 1], ['20H-21H', '20:56', '22:10', 100, 1], 
        ['20H-21H', '20:56', '22:55', 100, 1], ['21H-22H', '21:03', '22:51', 100, 1],
        ['21H-22H', '21:05', '23:12', 100, 1], ['21H-22H', '21:08', '22:59', 100, 1], 
        ['21H-22H', '21:10', '23:27', 100, 1], ['21H-22H', '21:10', '23:30', 100, 1], 
        ['21H-22H', '21:15', '23:23', 100, 1], ['21H-22H', '21:19', '21:56', 100, 1],
        ['21H-22H', '21:21', '22:48', 100, 1], ['21H-22H', '21:25', '23:26', 100, 1], 
        ['21H-22H', '21:25', '23:32', 100, 1], ['21H-22H', '21:27', '22:55', 100, 1], 
        ['21H-22H', '21:27', '23:32', 100, 1], ['21H-22H', '21:33', '23:11', 100, 1],
        ['21H-22H', '21:37', '23:04', 100, 1], ['21H-22H', '21:39', '00:05', 100, 1], 
        ['21H-22H', '21:40', '23:08', 100, 1], ['21H-22H', '21:45', '23:04', 100, 1], 
        ['21H-22H', '21:49', '00:06', 100, 1], ['21H-22H', '21:54', '00:07', 100, 1],
        ['21H-22H', '21:58', '00:02', 100, 1], ['21H-22H', '21:58', '00:24', 100, 1], 
        ['22H-23H', '22:10', '00:19', 100, 1]]
        

per_hour_columns = ['hour', 'power_kW']         #Header for individual vehicles
per_hour = []       #Empty data

x = 0
while(x < len(data)):       #Loop through all data
    y = int(data[x][1].split(':')[0])
    while(y != int(data[x][2].split(':')[0])+1):      #Loop though all hours the vehicle stayed
        if(y >= 24):
            y = 0
        i = 0
        while(i < data[x][4]):          #Loop through number of vehicles
            per_hour.append([str(y), data[x][3]])       #Append vehicle to array
            i+=1
        y+=1
    x+=1

per_hour_df = pd.DataFrame(per_hour, columns=per_hour_columns)      #Turn to dataframe


total_hours_header = ['time_slot', 'total_power_kW']        #header for total power per time slot
total_hours = []

i = 0
while(i < 24):      #Loop through all hours
    total = int(per_hour_df[per_hour_df['hour'] == str(i)].sum()[1])        #Get total for timeslot
    total_hours.append([str(i) + "H-" + str(i+1) + "H", total])             #Append time slot to array
    i+=1

total_hours_df = pd.DataFrame(total_hours, columns=total_hours_header)      #Turn to dataframe

print(total_hours_df)       #Print

I hope this works for you.

Output:

          time_slot  total_power_kW
0      0H-1H             600
1      1H-2H               0
2      2H-3H               0
3      3H-4H               0
4      4H-5H               0
5      5H-6H               0
6      6H-7H               0
7      7H-8H               0
8      8H-9H               0
9     9H-10H               0
10   10H-11H               0
11   11H-12H               0
12   12H-13H               0
13   13H-14H               0
14   14H-15H               0
15   15H-16H               0
16   16H-17H               0
17   17H-18H               0
18   18H-19H             100
19   19H-20H             700
20   20H-21H            3400
21   21H-22H            5200
22   22H-23H            4700
23   23H-24H            1800

To explain: I first create a new array which has one slot per vehicle (instead of having a vehicle count greater than 1). I also duplicate each vehicle for every hour is stayed. Next I looped through all hours (0-24) and appended the sum of all the elements with the current hour I am trying to find to a new array. This gives you an array which contains all 24 time slots with their corresponding power usage.

EDIT: I have changed the code slightly so that it records times that pass from hour 23 to 00

like image 24
SolveItPlanet Avatar answered Nov 04 '22 21:11

SolveItPlanet