I am trying to populate a new column in my pandas dataframe by considering the values of the previous n rows. If the current value is not equal to any of the past n values in that column, it should populate "N", else "Y". Please let me know what would be a good way to achieve this. Here's my input data : <pre class="prettyprint lang-py prettyprint-override"><code>testdata = {'col1' :['car','car','car','bus','bus','bus','car']} df = pd.DataFrame.from_dict(testdata) </code></pre> Input DF: <pre class="prettyprint"><code> col1 0 car 1 car 2 car 3 bus 4 bus 5 car 6 car </code></pre> Output DF (with n=2): <pre class="prettyprint"><code> col1 Result 0 car 1 car 2 car Y 3 bus N 4 bus Y 5 bus Y 6 car N </code></pre>

You can do this with a <code>Rolling.apply</code> call. <pre class="prettyprint"><code>n = 2 res = (df['col1'].astype('category') .cat.codes .rolling(n+1) .apply(lambda x: x[-1] in x[:-1], raw=True)) df['Result'] = np.where(res == 1, 'Y', 'N') df col1 Result 0 car N 1 car N 2 car Y 3 bus N 4 bus Y 5 bus Y 6 car N </code></pre> Rolling only works with numeric data, so the initial step is to factorise it. This can be done in many ways, I've used <code>astype('category')</code> and then extracted the codes. <hr> Another option is using <code>pd.Categorical</code> for the conversion, <pre class="prettyprint"><code>res = (df.assign(col1=pd.Categorical(df['col1']).codes)['col1'] .rolling(n+1) .apply(lambda x: x[-1] in x[:-1], raw=True)) df['Result'] = res.map({1: 'Y', 0: 'N'}) df col1 Result 0 car NaN 1 car NaN 2 car Y 3 bus N 4 bus Y 5 bus Y 6 car N </code></pre>

Compare the previous N rows to the current row in a pandas column

Tags:

python

pandas

dataframe

I am trying to populate a new column in my pandas dataframe by considering the values of the previous n rows. If the current value is not equal to any of the past n values in that column, it should populate "N", else "Y".

Please let me know what would be a good way to achieve this.

Here's my input data :

testdata = {'col1' :['car','car','car','bus','bus','bus','car']}
df = pd.DataFrame.from_dict(testdata)

Input DF:

  col1
0  car
1  car
2  car
3  bus
4  bus
5  car  
6  car

Output DF (with n=2):

  col1   Result
0  car         
1  car         
2  car      Y  
3  bus      N  
4  bus      Y  
5  bus      Y  
6  car      N

872

asked Jun 13 '19 03:06

tipsydeepz

2 Answers

Here is my way

n=2
l=[False]*n+[df.iloc[x,0] in df.iloc[x-n:x,0].tolist() for x in np.arange(n,len(df))]
df['New']=l
df
  col1    New
0  car  False
1  car  False
2  car   True
3  bus  False
4  bus   True
5  bus   True
6  car  False

113

answered Oct 23 '22 07:10

BENY

You can do this with a Rolling.apply call.

n = 2
res = (df['col1'].astype('category')
                 .cat.codes
                 .rolling(n+1)
                 .apply(lambda x: x[-1] in x[:-1], raw=True))

df['Result'] = np.where(res == 1, 'Y', 'N')
df

  col1 Result
0  car      N
1  car      N
2  car      Y
3  bus      N
4  bus      Y
5  bus      Y
6  car      N

Rolling only works with numeric data, so the initial step is to factorise it. This can be done in many ways, I've used astype('category') and then extracted the codes.

Another option is using pd.Categorical for the conversion,

res = (df.assign(col1=pd.Categorical(df['col1']).codes)['col1']
         .rolling(n+1)
         .apply(lambda x: x[-1] in x[:-1], raw=True))

df['Result'] = res.map({1: 'Y', 0: 'N'})
df

  col1 Result
0  car    NaN
1  car    NaN
2  car      Y
3  bus      N
4  bus      Y
5  bus      Y
6  car      N

answered Oct 23 '22 05:10

cs95

Related questions
                            
                                Open the authorization URL without opening browser Python
                            
                                Separate rooms in a floor plan using OpenCV
                            
                                Python: Distributed task queue for different specific workers
                            
                                Python asyncio skip processing untill function return
                            
                                Specify keys for mypy in python dictionary
                            
                                Swagger with Flask-Restplus, API and multiple Blueprints
                            
                                How to use Selenium on Colaboratory Google?
                            
                                Python: How to write error in the console in txt file?
                            
                                Python - why mock patch decorator does not pass the mocked object to the test function when `new` argument is not DEFAULT
                            
                                Comparison of a `float` to `np.nan` in Spark Dataframe
                            
                                Matplotlib sharex on data with different x values?
                            
                                Filter Dataset to get just images from specific class
                            
                                python 3.6+ logger to log pandas dataframe - how to indent the entire dataframe?
                            
                                Using ROC AUC score with Logistic Regression and Iris Dataset
                            
                                Python library functions taking no keyword arguments
                            
                                Simple Python TCP forking server using asyncio
                            
                                Comments in Python MANIFEST.in
                            
                                error: bad escape (end of pattern) at position 0 while trying to replace to backslah
                            
                                How to emulate multiprocessing.Pool.map() in AWS Lambda?
                            
                                Why ColumnTransformer does not call fit on its transformers?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With