I would like to visualize the number of required machines in a jobshop at a certain time in graph with on the x-axis a continuous time axis and on the y-axis the number of shifts.
In the dataframe below, you find an example of my data. Here, you see Shift_ID
s (which are unique) and the start and end time of that shift. Over a period of a day, I would like to see how many machines are needed at a certain interval. This can be 5 minutes, quarter of an hour, half an hour and hours.
df:
Shift_ID Shift_Time_Start Shift_Time_End
0 1 2016-03-22 9:00:00 2016-03-22 9:35:00
1 2 2016-03-22 9:20:00 2016-03-22 10:20:00
2 3 2016-03-22 9:40:00 2016-03-22 10:14:00
3 4 2016-03-22 10:00:00 2016-03-22 10:31:00
In this example in the quarter 9:30-9:45 I would need 3 machines to be able to do every shift at that specific time. The desired output would look something like this:
df2:
Interval Count
0 2016-03-22 9:00:00 - 2016-03-22 9:15:00 1
1 2016-03-22 9:15:00 - 2016-03-22 9:30:00 2
2 2016-03-22 9:30:00 - 2016-03-22 9:45:00 3
3 2016-03-22 9:45:00 - 2016-03-22 10:00:00 2
4 2016-03-22 10:00:00 - 2016-03-22 10:15:00 2
5 2016-03-22 10:15:00 - 2016-03-22 10:30:00 2
6 2016-03-22 10:30:00 - 2016-03-22 10:45:00 1
With this data frame i could round it to the the lowest boundary of the interval and then plot it in a graph.
I get stuck on how to "see" whether a shift lies within multiple intervals. Do you have any ideas how to tackle this?
NB: All date-time values are of course datetime type
EDIT after Solution of MaxU and knightofni
I used the code of MaxU to plot both your codes. They both seem to do it well on 15min but please take a look at you results with 5 minutes:
MaxU:
knightofni:
EDIT 2 4 april 2015
This is not too easy. I can't really think of a way to do it completely vectorized way, but here are 2 approaches who will work.
1- Reorganize your data so that you have only one datetime column. The objective is to have, for each shift_ID, one row per minimum interval. Then you will be able to to a timegrouper groupby.
Working Example :
import pandas as pd
import arrow
data = {
'Shift_ID' : [1,2,3,4],
'Shift_Time_Start' : [arrow.get('2016-03-22 09:00:00').datetime,
arrow.get('2016-03-22 09:20:00').datetime,
arrow.get('2016-03-22 09:40:00').datetime,
arrow.get('2016-03-22 10:00:00').datetime
],
'Shift_Time_End' : [arrow.get('2016-03-22 09:35:00').datetime,
arrow.get('2016-03-22 10:20:00').datetime,
arrow.get('2016-03-22 10:14:00').datetime,
arrow.get('2016-03-22 10:31:00').datetime
],
}
df = pd.DataFrame(data)
min_int = '5T'
df
Shift_ID Shift_Time_End Shift_Time_Start
0 1 2016-03-22 09:35:00+00:00 2016-03-22 09:00:00+00:00
1 2 2016-03-22 10:20:00+00:00 2016-03-22 09:20:00+00:00
2 3 2016-03-22 10:14:00+00:00 2016-03-22 09:40:00+00:00
3 4 2016-03-22 10:31:00+00:00 2016-03-22 10:00:00+00:00
new_data = {'time' : [], 'Shift_ID': []} # dict to hold the data
for row in df.iterrows():
# creates a list of all dates of this shift, from start to end
dates = pd.date_range(row[1].Shift_Time_Start, row[1].Shift_Time_End, freq=min_int)
for date in dates:
new_data['time'].append(date)
new_data['Shift_ID'].append(row[1].Shift_ID)
# creating the new df
newdf = pd.DataFrame(new_data).set_index('time')
newdf.head()
Shift_ID
time
2016-03-22 09:00:00+00:00 1
2016-03-22 09:05:00+00:00 1
2016-03-22 09:10:00+00:00 1
2016-03-22 09:15:00+00:00 1
2016-03-22 09:20:00+00:00 1
# We groupby the time column, resampling every min_int
# (in our case 5 minutes, represented by '5T'),
# then we check how many uniquer shift_id.
newdf.groupby(pd.TimeGrouper(freq=min_int)).agg({'Shift_ID': lambda x : len(set(x))})
Shift_ID
time
2016-03-22 09:00:00+00:00 1
2016-03-22 09:05:00+00:00 1
2016-03-22 09:10:00+00:00 1
2016-03-22 09:15:00+00:00 1
2016-03-22 09:20:00+00:00 2
2016-03-22 09:25:00+00:00 2
2016-03-22 09:30:00+00:00 2
2016-03-22 09:35:00+00:00 2
2016-03-22 09:40:00+00:00 2
This reads as at 9:15, there was one shift going on, while at 9:20, there were 2
This is not exactly your desired output, but i'd argue that this is much easier to plot. If you want to match your desired output it should be quite easy (just use .shift
to create copy of the date column moved by one).
** Edit
Link to notebook with code
you can do it this way:
Code:
import io
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
# load sample data into DF (data frame)
data="""\
idx;Shift_ID;Shift_Time_Start;Shift_Time_End
0;1;2016-03-22 09:00:00;2016-03-22 09:35:00
1;2;2016-03-22 09:20:00;2016-03-22 10:20:00
2;3;2016-03-22 09:40:00;2016-03-22 10:14:00
3;4;2016-03-22 10:00:00;2016-03-22 10:31:00
4;5;2016-03-22 08:11:00;2016-03-22 09:35:00
4;6;2016-03-23 14:11:00;2016-03-23 14:23:00
"""
df = pd.read_csv(io.StringIO(data), sep=';', index_col=0,
parse_dates=['Shift_Time_Start','Shift_Time_End'])
# time interval
freq = '10min'
# prepare resulting DF with desired intervals
a = pd.DataFrame({
'begin': pd.date_range(df.Shift_Time_Start.min(),
df.Shift_Time_End.max(), freq=freq)
})
# resample
a = a.set_index('begin').resample(rule='5min').first().reset_index()
a['end'] = a.begin + pd.Timedelta(freq)
# count number of unique Shift_ID's in `DF` for each interval in `a`
def f(x):
return df[( (x.begin >= df.Shift_Time_Start) \
& \
(x.begin <= df.Shift_Time_End)
) \
| \
( (x.end >= df.Shift_Time_Start) \
& \
(x.end <= df.Shift_Time_End)
) \
].Shift_ID.nunique()
a['count'] = a.apply(f, axis=1)
# remove rows without any shifts
a = a[a['count'] > 0].reset_index(drop=True)
a['interval'] = a.begin.dt.strftime('%d.%m %H:%M').astype(str) + \
' - ' + \
a.end.dt.strftime('%d.%m %H:%M').astype(str)
a = a.set_index('interval')[['count']]
print(a)
matplotlib.style.use('ggplot')
a.plot(kind='bar', alpha=0.75)
fig = plt.gcf()
fig.subplots_adjust(bottom=0.2)
plt.show()
Source data set:
In [135]: df
Out[135]:
Shift_ID Shift_Time_Start Shift_Time_End
idx
0 1 2016-03-22 09:00:00 2016-03-22 09:35:00
1 2 2016-03-22 09:20:00 2016-03-22 10:20:00
2 3 2016-03-22 09:40:00 2016-03-22 10:14:00
3 4 2016-03-22 10:00:00 2016-03-22 10:31:00
4 5 2016-03-22 08:11:00 2016-03-22 09:35:00
4 6 2016-03-23 14:11:00 2016-03-23 14:23:00
In [136]: a
Out[136]:
count
interval
22.03 08:10 - 22.03 08:20 1
22.03 08:15 - 22.03 08:25 1
22.03 08:20 - 22.03 08:30 1
22.03 08:25 - 22.03 08:35 1
22.03 08:30 - 22.03 08:40 1
22.03 08:35 - 22.03 08:45 1
22.03 08:40 - 22.03 08:50 1
22.03 08:45 - 22.03 08:55 1
22.03 08:50 - 22.03 09:00 2
22.03 08:55 - 22.03 09:05 2
22.03 09:00 - 22.03 09:10 2
22.03 09:05 - 22.03 09:15 2
22.03 09:10 - 22.03 09:20 3
22.03 09:15 - 22.03 09:25 3
22.03 09:20 - 22.03 09:30 3
22.03 09:25 - 22.03 09:35 3
22.03 09:30 - 22.03 09:40 4
22.03 09:35 - 22.03 09:45 4
22.03 09:40 - 22.03 09:50 2
22.03 09:45 - 22.03 09:55 2
22.03 09:50 - 22.03 10:00 3
22.03 09:55 - 22.03 10:05 3
22.03 10:00 - 22.03 10:10 3
22.03 10:05 - 22.03 10:15 3
22.03 10:10 - 22.03 10:20 3
22.03 10:15 - 22.03 10:25 2
22.03 10:20 - 22.03 10:30 2
22.03 10:25 - 22.03 10:35 1
22.03 10:30 - 22.03 10:40 1
23.03 14:05 - 23.03 14:15 1
23.03 14:10 - 23.03 14:20 1
23.03 14:15 - 23.03 14:25 1
23.03 14:20 - 23.03 14:30 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With