Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: How to create a column that indicates when a value is present in another column a set number of rows in advance?

Tags:

python

pandas

I'm trying to ascertain how I can create a column that indicates in advance (X rows) when the next occurrence of a value in another column will occur with pandas that in essence performs the following functionality (In this instance X = 3):

df

rowid  event   indicator
1      True    1 # Event occurs
2      False   0
3      False   0
4      False   1 # Starts indicator
5      False   1
6      True    1 # Event occurs
7      False   0

Apart from doing a iterative/recursive loop through every row:

i = df.index[df['event']==True]
dfx = [df.index[z-X:z] for z in i]
df['indicator'][dfx]=1
df['indicator'].fillna(0)

However this seems inefficient, is there a more succinct method of achieving the aforementioned example? Thanks

like image 239
James Avatar asked Nov 29 '19 08:11

James


People also ask

How do you check if one value is present in another column in Pandas?

Use in operator on a Series to check if a column contains/exists a string value in a pandas DataFrame. df['Courses'] returns a Series object with all values from column Courses , pandas. Series. unique will return unique values of the Series object.

How do you create a new column in Pandas and assign a value?

You can use the assign() function to add a new column to the end of a pandas DataFrame: df = df. assign(col_name=[value1, value2, value3, ...])

How do I change a column value based on conditions in Pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


2 Answers

Here's a NumPy based approach using flatnonzero:

X = 3
# ndarray of indices where indicator should be set to one
nd_ixs = np.flatnonzero(df.event)[:,None] - np.arange(X-1, -1, -1)
# flatten the indices
ixs = nd_ixs.ravel()
# filter out negative indices an set to 1
df['indicator'] = 0
df.loc[ixs[ixs>=0], 'indicator'] = 1

print(df)

    rowid  event  indicator
0      1   True          1
1      2  False          0
2      3  False          0
3      4  False          1
4      5  False          1
5      6   True          1
6      7  False          0

Where nd_ixs is obtained through the broadcasted subtraction of the indices where event is True and an arange up to X:

print(nd_ixs)

array([[-2, -1,  0],
       [ 3,  4,  5]], dtype=int64)
like image 105
yatu Avatar answered Sep 19 '22 15:09

yatu


A pandas and numpy solution:

# Make a variable shift:
def var_shift(series, X):
    return [series] + [series.shift(i) for i in range(-X + 1, 0, 1)]

X = 3
# Set indicator to default to 1
df["indicator"] = 1

# Use pd.Series.where and np.logical_or with the 
#  var_shift function to get a bool array, setting
#  0 when False
df["indicator"] = df["indicator"].where(
    np.logical_or.reduce(var_shift(df["event"], X)),
    0,
)

#    rowid  event  indicator
# 0      1   True          1
# 1      2  False          0
# 2      3  False          0
# 3      4  False          1
# 4      5  False          1
# 5      6   True          1
# 6      7  False          0

In [77]: np.logical_or.reduce(var_shift(df["event"], 3))
Out[77]: array([True, False, False, True, True, True, nan], dtype=object)
like image 32
Alex Avatar answered Sep 22 '22 15:09

Alex