Calculate required equipment on shifts in timespans

Question

I would like to visualize the number of required machines in a jobshop at a certain time in graph with on the x-axis a continuous time axis and on the y-axis the number of shifts.

In the dataframe below, you find an example of my data. Here, you see Shift_IDs (which are unique) and the start and end time of that shift. Over a period of a day, I would like to see how many machines are needed at a certain interval. This can be 5 minutes, quarter of an hour, half an hour and hours.

df:
   Shift_ID     Shift_Time_Start       Shift_Time_End
0         1   2016-03-22 9:00:00   2016-03-22 9:35:00
1         2   2016-03-22 9:20:00  2016-03-22 10:20:00
2         3   2016-03-22 9:40:00  2016-03-22 10:14:00
3         4  2016-03-22 10:00:00  2016-03-22 10:31:00

In this example in the quarter 9:30-9:45 I would need 3 machines to be able to do every shift at that specific time. The desired output would look something like this:

df2:

                                    Interval  Count
0    2016-03-22 9:00:00 - 2016-03-22 9:15:00      1
1    2016-03-22 9:15:00 - 2016-03-22 9:30:00      2
2    2016-03-22 9:30:00 - 2016-03-22 9:45:00      3
3   2016-03-22 9:45:00 - 2016-03-22 10:00:00      2
4  2016-03-22 10:00:00 - 2016-03-22 10:15:00      2
5  2016-03-22 10:15:00 - 2016-03-22 10:30:00      2
6  2016-03-22 10:30:00 - 2016-03-22 10:45:00      1

With this data frame i could round it to the the lowest boundary of the interval and then plot it in a graph.

I get stuck on how to "see" whether a shift lies within multiple intervals. Do you have any ideas how to tackle this?

NB: All date-time values are of course datetime type

EDIT after Solution of MaxU and knightofni

I used the code of MaxU to plot both your codes. They both seem to do it well on 15min but please take a look at you results with 5 minutes:

MaxU:

enter image description here

knightofni:

enter image description here

EDIT 2 4 april 2015

knightofni · Accepted Answer

This is not too easy. I can't really think of a way to do it completely vectorized way, but here are 2 approaches who will work.

1- Reorganize your data so that you have only one datetime column. The objective is to have, for each shift_ID, one row per minimum interval. Then you will be able to to a timegrouper groupby.

Working Example :

Recreating your DataFrame

import pandas as pd
import arrow

data = {
    'Shift_ID' : [1,2,3,4],
    'Shift_Time_Start' : [arrow.get('2016-03-22 09:00:00').datetime, 
                   arrow.get('2016-03-22 09:20:00').datetime,
                   arrow.get('2016-03-22 09:40:00').datetime,
                   arrow.get('2016-03-22 10:00:00').datetime
                   ],

    'Shift_Time_End' : [arrow.get('2016-03-22 09:35:00').datetime, 
                   arrow.get('2016-03-22 10:20:00').datetime,
                   arrow.get('2016-03-22 10:14:00').datetime,
                   arrow.get('2016-03-22 10:31:00').datetime
                   ],   
        }


df = pd.DataFrame(data)
min_int = '5T'
df

Shift_ID    Shift_Time_End  Shift_Time_Start
0   1   2016-03-22 09:35:00+00:00   2016-03-22 09:00:00+00:00
1   2   2016-03-22 10:20:00+00:00   2016-03-22 09:20:00+00:00
2   3   2016-03-22 10:14:00+00:00   2016-03-22 09:40:00+00:00
3   4   2016-03-22 10:31:00+00:00   2016-03-22 10:00:00+00:00

Creating New Df

new_data = {'time' : [], 'Shift_ID': []} # dict to hold the data

for row in df.iterrows():
    # creates a list of all dates of this shift, from start to end
    dates = pd.date_range(row[1].Shift_Time_Start, row[1].Shift_Time_End, freq=min_int)
    for date in dates:
        new_data['time'].append(date)
        new_data['Shift_ID'].append(row[1].Shift_ID)

# creating the new df    
newdf = pd.DataFrame(new_data).set_index('time')
newdf.head()


Shift_ID
time    
2016-03-22 09:00:00+00:00   1
2016-03-22 09:05:00+00:00   1
2016-03-22 09:10:00+00:00   1
2016-03-22 09:15:00+00:00   1
2016-03-22 09:20:00+00:00   1

Groupby Timegrouper

# We groupby the time column, resampling every min_int 
# (in our case 5 minutes, represented by '5T'), 
# then we check how many uniquer shift_id.
newdf.groupby(pd.TimeGrouper(freq=min_int)).agg({'Shift_ID': lambda x : len(set(x))})

    Shift_ID
time    
2016-03-22 09:00:00+00:00   1
2016-03-22 09:05:00+00:00   1
2016-03-22 09:10:00+00:00   1
2016-03-22 09:15:00+00:00   1
2016-03-22 09:20:00+00:00   2
2016-03-22 09:25:00+00:00   2
2016-03-22 09:30:00+00:00   2
2016-03-22 09:35:00+00:00   2
2016-03-22 09:40:00+00:00   2

This reads as at 9:15, there was one shift going on, while at 9:20, there were 2

This is not exactly your desired output, but i'd argue that this is much easier to plot. If you want to match your desired output it should be quite easy (just use .shift to create copy of the date column moved by one).

** Edit

Link to notebook with code

MaxU - stop WAR against UA · Answer

you can do it this way:

Code:

import io
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt

# load sample data into DF (data frame)
data="""\
idx;Shift_ID;Shift_Time_Start;Shift_Time_End
0;1;2016-03-22 09:00:00;2016-03-22 09:35:00
1;2;2016-03-22 09:20:00;2016-03-22 10:20:00
2;3;2016-03-22 09:40:00;2016-03-22 10:14:00
3;4;2016-03-22 10:00:00;2016-03-22 10:31:00
4;5;2016-03-22 08:11:00;2016-03-22 09:35:00
4;6;2016-03-23 14:11:00;2016-03-23 14:23:00
"""
df = pd.read_csv(io.StringIO(data), sep=';', index_col=0,
                 parse_dates=['Shift_Time_Start','Shift_Time_End'])


# time interval
freq = '10min'

# prepare resulting DF with desired intervals
a  = pd.DataFrame({
  'begin': pd.date_range(df.Shift_Time_Start.min(),
                         df.Shift_Time_End.max(), freq=freq)
})
# resample 
a = a.set_index('begin').resample(rule='5min').first().reset_index()

a['end'] = a.begin + pd.Timedelta(freq)

# count number of unique Shift_ID's in `DF` for each interval in `a`
def f(x):
    return  df[( (x.begin >= df.Shift_Time_Start) \
                 & \
                 (x.begin <= df.Shift_Time_End)
               ) \
               | \
               ( (x.end >= df.Shift_Time_Start) \
                 & \
                 (x.end <= df.Shift_Time_End)
               ) \
              ].Shift_ID.nunique()


a['count'] = a.apply(f, axis=1)
# remove rows without any shifts
a = a[a['count'] > 0].reset_index(drop=True)


a['interval'] = a.begin.dt.strftime('%d.%m %H:%M').astype(str) + \
                ' - ' + \
                a.end.dt.strftime('%d.%m %H:%M').astype(str)

a = a.set_index('interval')[['count']]
print(a)

matplotlib.style.use('ggplot')

a.plot(kind='bar', alpha=0.75)
fig = plt.gcf()
fig.subplots_adjust(bottom=0.2)

plt.show()

enter image description here

Source data set:

In [135]: df
Out[135]:
     Shift_ID    Shift_Time_Start      Shift_Time_End
idx
0           1 2016-03-22 09:00:00 2016-03-22 09:35:00
1           2 2016-03-22 09:20:00 2016-03-22 10:20:00
2           3 2016-03-22 09:40:00 2016-03-22 10:14:00
3           4 2016-03-22 10:00:00 2016-03-22 10:31:00
4           5 2016-03-22 08:11:00 2016-03-22 09:35:00
4           6 2016-03-23 14:11:00 2016-03-23 14:23:00

In [136]: a
Out[136]:
                           count
interval
22.03 08:10 - 22.03 08:20      1
22.03 08:15 - 22.03 08:25      1
22.03 08:20 - 22.03 08:30      1
22.03 08:25 - 22.03 08:35      1
22.03 08:30 - 22.03 08:40      1
22.03 08:35 - 22.03 08:45      1
22.03 08:40 - 22.03 08:50      1
22.03 08:45 - 22.03 08:55      1
22.03 08:50 - 22.03 09:00      2
22.03 08:55 - 22.03 09:05      2
22.03 09:00 - 22.03 09:10      2
22.03 09:05 - 22.03 09:15      2
22.03 09:10 - 22.03 09:20      3
22.03 09:15 - 22.03 09:25      3
22.03 09:20 - 22.03 09:30      3
22.03 09:25 - 22.03 09:35      3
22.03 09:30 - 22.03 09:40      4
22.03 09:35 - 22.03 09:45      4
22.03 09:40 - 22.03 09:50      2
22.03 09:45 - 22.03 09:55      2
22.03 09:50 - 22.03 10:00      3
22.03 09:55 - 22.03 10:05      3
22.03 10:00 - 22.03 10:10      3
22.03 10:05 - 22.03 10:15      3
22.03 10:10 - 22.03 10:20      3
22.03 10:15 - 22.03 10:25      2
22.03 10:20 - 22.03 10:30      2
22.03 10:25 - 22.03 10:35      1
22.03 10:30 - 22.03 10:40      1
23.03 14:05 - 23.03 14:15      1
23.03 14:10 - 23.03 14:20      1
23.03 14:15 - 23.03 14:25      1
23.03 14:20 - 23.03 14:30      1

Calculate required equipment on shifts in timespans

Tags:

python

pandas

numpy

Uis234

2 Answers

Recreating your DataFrame

Creating New Df

Groupby Timegrouper

knightofni

MaxU - stop WAR against UA

Recent Activity

Donate For Us

Calculate required equipment on shifts in timespans

Tags:

python

pandas

numpy

Uis234

2 Answers

Recreating your DataFrame

Creating New Df

Groupby Timegrouper

knightofni

MaxU - stop WAR against UA

Related questions

Recent Activity

Donate For Us