I'm trying to parse a logfile of our manufacturing process. Most of the time the process is run automatically but occasionally, the engineer needs to switch into manual mode to make some changes and then switches back to automatic control by the reactor software. When set to manual mode the logfile records the step as being "MAN.OP." instead of a number. Below is a representative example.
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
ser_orig = pd.Series(steps)
which results in
0 1
1 2
2 2
3 MAN.OP.
4 MAN.OP.
5 2
6 2
7 3
8 3
9 MAN.OP.
10 MAN.OP.
11 4
12 4
dtype: object
I need to detect the 'MAN.OP.' and make them distinct from each other. In this example, the two regions with values == 2 should be one region after detecting the manual mode section like this:
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
I have code that iterates over this series and produces the correct result when the series is passed to my object. The setter is:
@step_series.setter
def step_series(self, ss):
"""
On assignment, give the manual mode steps a unique name. Leave
the steps done on recipe the same.
"""
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
counter = 0
continuous = False
for i in ss.index:
if continuous and ss.at[i] != manual_mode:
continuous = False
counter += 1
elif not continuous and ss.at[i] == manual_mode:
continuous = True
ss.at[i] = new_manual_mode_text.format(str(counter))
elif continuous and ss.at[i] == manual_mode:
ss.at[i] = new_manual_mode_text.format(str(counter))
self._step_series = ss
but this iterates over the entire dataframe and is the slowest part of my code other than reading the logfile over the network.
How can I detect these non-unique sections and rename them uniquely without iterating over the entire series? The series is a column selection from a larger dataframe so adding extra columns is fine if needed.
For the completed answer I ended up with:
@step_series.setter
def step_series(self, ss):
pd.options.mode.chained_assignment = None
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
newManOp = (ss=='MAN.OP.') & (ss != ss.shift())
ss[ss == 'MAN.OP.'] = 'Manual_Mode_' + (newManOp.cumsum()-1).astype(str)
self._step_series = ss
Here's one way:
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
steps = pd.Series(steps)
newManOp = (steps=='MAN.OP.') & (steps != steps.shift())
steps[steps=='MAN.OP.'] += seq.cumsum().astype(str)
>>> steps
0 1
1 2
2 2
3 MAN.OP.1
4 MAN.OP.1
5 2
6 2
7 3
8 3
9 MAN.OP.2
10 MAN.OP.2
11 4
12 4
dtype: object
To get the exact format you listed (starting from zero instead of one, and changing from "MAN.OP." to "Manual_mode_"), just tweak the last line:
steps[steps=='MAN.OP.'] = 'Manual_Mode_' + (seq.cumsum()-1).astype(str)
>>> steps
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
There a pandas enhancement request for contiguous groupby, which would make this type of task simpler.
There is s function in matplotlib that takes a boolean array and returns a list of (start, end) pairs. Each pair represents a contiguous region where the input is True
.
import matplotlib.mlab as mlab
regions = mlab.contiguous_regions(ser_orig == manual_mode)
for i, (start, end) in enumerate(regions):
ser_orig[start:end] = new_manual_mode_text.format(i)
ser_orig
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With