Pandas: How to create a column that indicates when a value is present in another column a set number of rows in advance?

Tags:

I'm trying to ascertain how I can create a column that indicates in advance (X rows) when the next occurrence of a value in another column will occur with pandas that in essence performs the following functionality (In this instance X = 3):

rowid  event   indicator
1      True    1 # Event occurs
2      False   0
3      False   0
4      False   1 # Starts indicator
5      False   1
6      True    1 # Event occurs
7      False   0

Apart from doing a iterative/recursive loop through every row:

i = df.index[df['event']==True]
dfx = [df.index[z-X:z] for z in i]
df['indicator'][dfx]=1
df['indicator'].fillna(0)

However this seems inefficient, is there a more succinct method of achieving the aforementioned example? Thanks

239

asked Nov 29 '19 08:11

James

2 Answers

Here's a NumPy based approach using flatnonzero:

X = 3
# ndarray of indices where indicator should be set to one
nd_ixs = np.flatnonzero(df.event)[:,None] - np.arange(X-1, -1, -1)
# flatten the indices
ixs = nd_ixs.ravel()
# filter out negative indices an set to 1
df['indicator'] = 0
df.loc[ixs[ixs>=0], 'indicator'] = 1

print(df)

    rowid  event  indicator
0      1   True          1
1      2  False          0
2      3  False          0
3      4  False          1
4      5  False          1
5      6   True          1
6      7  False          0

Where nd_ixs is obtained through the broadcasted subtraction of the indices where event is True and an arange up to X:

print(nd_ixs)

array([[-2, -1,  0],
       [ 3,  4,  5]], dtype=int64)

105

answered Sep 19 '22 15:09

yatu

A pandas and numpy solution:

# Make a variable shift:
def var_shift(series, X):
    return [series] + [series.shift(i) for i in range(-X + 1, 0, 1)]

X = 3
# Set indicator to default to 1
df["indicator"] = 1

# Use pd.Series.where and np.logical_or with the 
#  var_shift function to get a bool array, setting
#  0 when False
df["indicator"] = df["indicator"].where(
    np.logical_or.reduce(var_shift(df["event"], X)),
    0,
)

#    rowid  event  indicator
# 0      1   True          1
# 1      2  False          0
# 2      3  False          0
# 3      4  False          1
# 4      5  False          1
# 5      6   True          1
# 6      7  False          0

In [77]: np.logical_or.reduce(var_shift(df["event"], 3))
Out[77]: array([True, False, False, True, True, True, nan], dtype=object)

answered Sep 22 '22 15:09

Alex

Related questions
                            
                                tkinter mainloop() function logs me out of my mac
                            
                                Reading multiple large csv files of size 10GB plus parallel in python
                            
                                Connection to pypi.org timed out
                            
                                Python: explicitly use default arguments
                            
                                Creating views on an object by sharing its __dict__ attribute
                            
                                Flask app unable to query data from mongodb using mongoengine in a dockerized setup
                            
                                Number of classes, 4, does not match size of target_names, 6. Try specifying the labels parameter
                            
                                “ImportError: No module named boto3” on mac
                            
                                Celery Tasks are executing multiple times in Django Application
                            
                                pip: selecting index url based on package name?
                            
                                What is an efficient pandas method to reindex this shift schedule?
                            
                                teradatasql: runtime/cgo: could not obtain pthread_keys
                            
                                Selenium Webdriver Chrome Fails on Cloud Function
                            
                                Connect to Proxy (SOCKS) Database in python
                            
                                tensorflow2 error - failed to create a plugin/profile/ directory in function call tf.summary.trace_export
                            
                                Run ipdb with seperate terminal in pycharm
                            
                                How to create dummy variables using pandas with reference to one value?
                            
                                Python Pyglet mouse events don't call on_draw() nor make changes in window
                            
                                one-hot encode of multiple string categorical features using Spark DataFrames
                            
                                Using Python or TypeScript to transform text that I select in VSCode

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas: How to create a column that indicates when a value is present in another column a set number of rows in advance?

Tags:

python

pandas

James

People also ask

2 Answers

yatu

Alex

Recent Activity

Donate For Us