I'm trying to ascertain how I can create a column that indicates in advance (X rows) when the next occurrence of a value in another column will occur with pandas that in essence performs the following functionality (In this instance X = 3):
df
rowid event indicator
1 True 1 # Event occurs
2 False 0
3 False 0
4 False 1 # Starts indicator
5 False 1
6 True 1 # Event occurs
7 False 0
Apart from doing a iterative/recursive loop through every row:
i = df.index[df['event']==True]
dfx = [df.index[z-X:z] for z in i]
df['indicator'][dfx]=1
df['indicator'].fillna(0)
However this seems inefficient, is there a more succinct method of achieving the aforementioned example? Thanks
Use in operator on a Series to check if a column contains/exists a string value in a pandas DataFrame. df['Courses'] returns a Series object with all values from column Courses , pandas. Series. unique will return unique values of the Series object.
You can use the assign() function to add a new column to the end of a pandas DataFrame: df = df. assign(col_name=[value1, value2, value3, ...])
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Here's a NumPy
based approach using flatnonzero:
X = 3
# ndarray of indices where indicator should be set to one
nd_ixs = np.flatnonzero(df.event)[:,None] - np.arange(X-1, -1, -1)
# flatten the indices
ixs = nd_ixs.ravel()
# filter out negative indices an set to 1
df['indicator'] = 0
df.loc[ixs[ixs>=0], 'indicator'] = 1
print(df)
rowid event indicator
0 1 True 1
1 2 False 0
2 3 False 0
3 4 False 1
4 5 False 1
5 6 True 1
6 7 False 0
Where nd_ixs
is obtained through the broadcasted subtraction of the indices where event
is True
and an arange up to X
:
print(nd_ixs)
array([[-2, -1, 0],
[ 3, 4, 5]], dtype=int64)
A pandas
and numpy
solution:
# Make a variable shift:
def var_shift(series, X):
return [series] + [series.shift(i) for i in range(-X + 1, 0, 1)]
X = 3
# Set indicator to default to 1
df["indicator"] = 1
# Use pd.Series.where and np.logical_or with the
# var_shift function to get a bool array, setting
# 0 when False
df["indicator"] = df["indicator"].where(
np.logical_or.reduce(var_shift(df["event"], X)),
0,
)
# rowid event indicator
# 0 1 True 1
# 1 2 False 0
# 2 3 False 0
# 3 4 False 1
# 4 5 False 1
# 5 6 True 1
# 6 7 False 0
In [77]: np.logical_or.reduce(var_shift(df["event"], 3))
Out[77]: array([True, False, False, True, True, True, nan], dtype=object)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With