Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an efficient pandas method to reindex this shift schedule?

I have a pandas dataframe that represents a shift schedule for an entire year, given as:


January 2019    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31
Shift A         1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1
Shift B         0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2
Shift C         0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0
Shift D         2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0
February 2019   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  nan nan nan
Shift A         0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   nan nan nan
Shift B         0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   nan nan nan
Shift C         2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   nan nan nan
Shift D         1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   nan nan nan

Where 1 represents Day shift (06:00 - 18:00), 2 represents Night shift (18:00 - 06:00) and 0 can be ignored. Only a single shift team will be working for a given period.

I need the data in a format where the data is indexed by the DateTime stamp with the current working shift, e.g. :

             DateTime Shift
0 2019-01-01 06:00:00 A
1 2019-01-01 18:00:00 D
2 2019-01-02 06:00:00 A
3 2019-01-02 18:00:00 B
4 2019-01-03 06:00:00 A
5 2019-01-03 18:00:00 B
.
.
.

What would be the most efficient Pandas method to re-index the data to achieve this, i.e. avoiding for loops?

like image 841
Brad Avatar asked Oct 11 '19 12:10

Brad


People also ask

How do I reindex in pandas?

One can reindex a single column or multiple columns by using reindex() method and by specifying the axis we want to reindex. Default values in the new index that are not present in the dataframe are assigned NaN.

What is the best way to shift a pandas DataFrame column?

shift() If you want to shift your column or subtract the column value with the previous row value from the DataFrame, you can do it by using the shift() function. It consists of a scalar parameter called period, which is responsible for showing the number of shifts to be made over the desired axis.

What is reindex function in Python?

The reindex() method allows you to change the row indexes, and the columns labels. ;] Note: The values are set to NaN if the new index is not the same as the old.


1 Answers

Use:

#get first column by position
first = df.iloc[:, 0]
#convert column to datetimes with missing values for no datetimes values
dates = pd.to_datetime(first, errors='coerce')
#mask for data row
mask = dates.isna()
#forward filling missing values and replace first NaNs by first column name
df.index = dates.ffill().fillna(pd.to_datetime(first.name))
#filter out rows with datetimes in first column, add first column to index
df = df[mask.values].set_index(first.name, append=True)
#convert columns names to timedeltas in days, first is 0 days
df.columns = pd.to_timedelta(df.columns.astype(int) - 1, unit='D')
#dictionary for map 1, 2 values
mapp = {1: pd.Timedelta('06:00:00'), 2:pd.Timedelta('18:00:00')}
#remove 0 rows with convert to NaN by mask and reshape by stack
#map by dict and convert MultiIndex to columns
df = (df.mask(df == 0)
       .stack()
       .map(mapp)
       .rename_axis(('Datetime','Shift', 'day'))
       .reset_index(name='td')
       )
#add days to hours and add to Datetime
df['Datetime'] += (df.pop('td') + df.pop('day'))
#sorting ans create default index
df = df.sort_values(['Datetime','Shift']).reset_index(drop=True)

print (df)
               Datetime    Shift
0   2019-01-01 06:00:00  Shift A
1   2019-01-01 18:00:00  Shift D
2   2019-01-02 06:00:00  Shift A
3   2019-01-02 18:00:00  Shift B
4   2019-01-03 06:00:00  Shift A
..                  ...      ...
113 2019-02-26 18:00:00  Shift D
114 2019-02-27 06:00:00  Shift A
115 2019-02-27 18:00:00  Shift B
116 2019-02-28 06:00:00  Shift A
117 2019-02-28 18:00:00  Shift B

[118 rows x 2 columns]
like image 130
jezrael Avatar answered Nov 01 '22 04:11

jezrael