Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Calculate required equipment on shifts in timespans

I would like to visualize the number of required machines in a jobshop at a certain time in graph with on the x-axis a continuous time axis and on the y-axis the number of shifts.

In the dataframe below, you find an example of my data. Here, you see Shift_IDs (which are unique) and the start and end time of that shift. Over a period of a day, I would like to see how many machines are needed at a certain interval. This can be 5 minutes, quarter of an hour, half an hour and hours.

   Shift_ID     Shift_Time_Start       Shift_Time_End
0         1   2016-03-22 9:00:00   2016-03-22 9:35:00
1         2   2016-03-22 9:20:00  2016-03-22 10:20:00
2         3   2016-03-22 9:40:00  2016-03-22 10:14:00
3         4  2016-03-22 10:00:00  2016-03-22 10:31:00

In this example in the quarter 9:30-9:45 I would need 3 machines to be able to do every shift at that specific time. The desired output would look something like this:


                                    Interval  Count
0    2016-03-22 9:00:00 - 2016-03-22 9:15:00      1
1    2016-03-22 9:15:00 - 2016-03-22 9:30:00      2
2    2016-03-22 9:30:00 - 2016-03-22 9:45:00      3
3   2016-03-22 9:45:00 - 2016-03-22 10:00:00      2
4  2016-03-22 10:00:00 - 2016-03-22 10:15:00      2
5  2016-03-22 10:15:00 - 2016-03-22 10:30:00      2
6  2016-03-22 10:30:00 - 2016-03-22 10:45:00      1

With this data frame i could round it to the the lowest boundary of the interval and then plot it in a graph.

I get stuck on how to "see" whether a shift lies within multiple intervals. Do you have any ideas how to tackle this?

NB: All date-time values are of course datetime type

EDIT after Solution of MaxU and knightofni

I used the code of MaxU to plot both your codes. They both seem to do it well on 15min but please take a look at you results with 5 minutes:


enter image description here


enter image description here

EDIT 2 4 april 2015

like image 692
Uis234 Avatar asked Oct 31 '22 06:10


2 Answers

This is not too easy. I can't really think of a way to do it completely vectorized way, but here are 2 approaches who will work.

1- Reorganize your data so that you have only one datetime column. The objective is to have, for each shift_ID, one row per minimum interval. Then you will be able to to a timegrouper groupby.

Working Example :

Recreating your DataFrame

import pandas as pd
import arrow

data = {
    'Shift_ID' : [1,2,3,4],
    'Shift_Time_Start' : [arrow.get('2016-03-22 09:00:00').datetime, 
                   arrow.get('2016-03-22 09:20:00').datetime,
                   arrow.get('2016-03-22 09:40:00').datetime,
                   arrow.get('2016-03-22 10:00:00').datetime

    'Shift_Time_End' : [arrow.get('2016-03-22 09:35:00').datetime, 
                   arrow.get('2016-03-22 10:20:00').datetime,
                   arrow.get('2016-03-22 10:14:00').datetime,
                   arrow.get('2016-03-22 10:31:00').datetime

df = pd.DataFrame(data)
min_int = '5T'

Shift_ID    Shift_Time_End  Shift_Time_Start
0   1   2016-03-22 09:35:00+00:00   2016-03-22 09:00:00+00:00
1   2   2016-03-22 10:20:00+00:00   2016-03-22 09:20:00+00:00
2   3   2016-03-22 10:14:00+00:00   2016-03-22 09:40:00+00:00
3   4   2016-03-22 10:31:00+00:00   2016-03-22 10:00:00+00:00

Creating New Df

new_data = {'time' : [], 'Shift_ID': []} # dict to hold the data

for row in df.iterrows():
    # creates a list of all dates of this shift, from start to end
    dates = pd.date_range(row[1].Shift_Time_Start, row[1].Shift_Time_End, freq=min_int)
    for date in dates:

# creating the new df    
newdf = pd.DataFrame(new_data).set_index('time')

2016-03-22 09:00:00+00:00   1
2016-03-22 09:05:00+00:00   1
2016-03-22 09:10:00+00:00   1
2016-03-22 09:15:00+00:00   1
2016-03-22 09:20:00+00:00   1

Groupby Timegrouper

# We groupby the time column, resampling every min_int 
# (in our case 5 minutes, represented by '5T'), 
# then we check how many uniquer shift_id.
newdf.groupby(pd.TimeGrouper(freq=min_int)).agg({'Shift_ID': lambda x : len(set(x))})

2016-03-22 09:00:00+00:00   1
2016-03-22 09:05:00+00:00   1
2016-03-22 09:10:00+00:00   1
2016-03-22 09:15:00+00:00   1
2016-03-22 09:20:00+00:00   2
2016-03-22 09:25:00+00:00   2
2016-03-22 09:30:00+00:00   2
2016-03-22 09:35:00+00:00   2
2016-03-22 09:40:00+00:00   2

This reads as at 9:15, there was one shift going on, while at 9:20, there were 2

This is not exactly your desired output, but i'd argue that this is much easier to plot. If you want to match your desired output it should be quite easy (just use .shift to create copy of the date column moved by one).

** Edit

Link to notebook with code

like image 165
knightofni Avatar answered Nov 12 '22 22:11


you can do it this way:


import io
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

# load sample data into DF (data frame)
0;1;2016-03-22 09:00:00;2016-03-22 09:35:00
1;2;2016-03-22 09:20:00;2016-03-22 10:20:00
2;3;2016-03-22 09:40:00;2016-03-22 10:14:00
3;4;2016-03-22 10:00:00;2016-03-22 10:31:00
4;5;2016-03-22 08:11:00;2016-03-22 09:35:00
4;6;2016-03-23 14:11:00;2016-03-23 14:23:00
df = pd.read_csv(io.StringIO(data), sep=';', index_col=0,

# time interval
freq = '10min'

# prepare resulting DF with desired intervals
a  = pd.DataFrame({
  'begin': pd.date_range(df.Shift_Time_Start.min(),
                         df.Shift_Time_End.max(), freq=freq)
# resample 
a = a.set_index('begin').resample(rule='5min').first().reset_index()

a['end'] = a.begin + pd.Timedelta(freq)

# count number of unique Shift_ID's in `DF` for each interval in `a`
def f(x):
    return  df[( (x.begin >= df.Shift_Time_Start) \
                 & \
                 (x.begin <= df.Shift_Time_End)
               ) \
               | \
               ( (x.end >= df.Shift_Time_Start) \
                 & \
                 (x.end <= df.Shift_Time_End)
               ) \

a['count'] = a.apply(f, axis=1)
# remove rows without any shifts
a = a[a['count'] > 0].reset_index(drop=True)

a['interval'] = a.begin.dt.strftime('%d.%m %H:%M').astype(str) + \
                ' - ' + \
                a.end.dt.strftime('%d.%m %H:%M').astype(str)

a = a.set_index('interval')[['count']]


a.plot(kind='bar', alpha=0.75)
fig = plt.gcf()


enter image description here

Source data set:

In [135]: df
     Shift_ID    Shift_Time_Start      Shift_Time_End
0           1 2016-03-22 09:00:00 2016-03-22 09:35:00
1           2 2016-03-22 09:20:00 2016-03-22 10:20:00
2           3 2016-03-22 09:40:00 2016-03-22 10:14:00
3           4 2016-03-22 10:00:00 2016-03-22 10:31:00
4           5 2016-03-22 08:11:00 2016-03-22 09:35:00
4           6 2016-03-23 14:11:00 2016-03-23 14:23:00

In [136]: a
22.03 08:10 - 22.03 08:20      1
22.03 08:15 - 22.03 08:25      1
22.03 08:20 - 22.03 08:30      1
22.03 08:25 - 22.03 08:35      1
22.03 08:30 - 22.03 08:40      1
22.03 08:35 - 22.03 08:45      1
22.03 08:40 - 22.03 08:50      1
22.03 08:45 - 22.03 08:55      1
22.03 08:50 - 22.03 09:00      2
22.03 08:55 - 22.03 09:05      2
22.03 09:00 - 22.03 09:10      2
22.03 09:05 - 22.03 09:15      2
22.03 09:10 - 22.03 09:20      3
22.03 09:15 - 22.03 09:25      3
22.03 09:20 - 22.03 09:30      3
22.03 09:25 - 22.03 09:35      3
22.03 09:30 - 22.03 09:40      4
22.03 09:35 - 22.03 09:45      4
22.03 09:40 - 22.03 09:50      2
22.03 09:45 - 22.03 09:55      2
22.03 09:50 - 22.03 10:00      3
22.03 09:55 - 22.03 10:05      3
22.03 10:00 - 22.03 10:10      3
22.03 10:05 - 22.03 10:15      3
22.03 10:10 - 22.03 10:20      3
22.03 10:15 - 22.03 10:25      2
22.03 10:20 - 22.03 10:30      2
22.03 10:25 - 22.03 10:35      1
22.03 10:30 - 22.03 10:40      1
23.03 14:05 - 23.03 14:15      1
23.03 14:10 - 23.03 14:20      1
23.03 14:15 - 23.03 14:25      1
23.03 14:20 - 23.03 14:30      1
like image 30
MaxU - stop WAR against UA Avatar answered Nov 12 '22 22:11

MaxU - stop WAR against UA