Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace various duplicate values with np.nan within multiple columns

I am trying to replace duplicate values across multiple columns in a pandas df. For the df below I have dates and values respective to these dates. All values are the same for each date. I just want to keep the first value for each date and replace the following duplicate values with np.nan. Below is my attempt:

import pandas as pd
import numpy as np

d = ({
    'Date' : ['1/1/18','1/1/18','1/1/18','2/1/18','2/1/18','3/2/18','3/2/18','3/2/18'],                 
    'Val_D' : [10,10,10,22,22,10,10,10],      
    'Val_M' : [100,100,100,100,100,240,240,240],                                   
     })

df = pd.DataFrame(data = d)

df['Date'] = pd.to_datetime(df['Date'], format= '%d/%m/%y')

pd.Series([10,22,100,240]).duplicated()
dup = df.apply(pd.Series.duplicated, axis = 1)
df = df.where(~dup,np.nan)

print(df)

Intended Output:

     Date Val_D Val_M
0  1/1/18    10   100
1  1/1/18            
2  1/1/18            
3  2/1/18    22      
4  2/1/18            
5  3/2/18    10   240
6  3/2/18            
7  3/2/18
like image 710
jonboy Avatar asked Mar 19 '26 09:03

jonboy


2 Answers

Well, one way is to simply use diff+ne

s = df[['Val_D', 'Val_M']]
df[['Val_D', 'Val_M']] = s[s.diff().ne(0)].fillna('')

Even though this yields your expected output, the dtypes of your columns become object and you lose your vectorization power with numbers. So I'd suggest you don't do that. Without the last piece fillna(''), you get

    Date        Val_D   Val_M
0   2018-01-01  10.0    100.0
1   2018-01-01  NaN     NaN
2   2018-01-01  NaN     NaN
3   2018-01-02  22.0    NaN
4   2018-01-02  NaN     NaN
5   2018-02-03  10.0    240.0
6   2018-02-03  NaN     NaN
7   2018-02-03  NaN     NaN

The dtypes are float. Now, with the fillna('') piece, you get

    Date        Val_D   Val_M
0   2018-01-01  10      100
1   2018-01-01      
2   2018-01-01      
3   2018-01-02  22  
4   2018-01-02      
5   2018-02-03  10      240
6   2018-02-03      
7   2018-02-03      

with object dtypes.

like image 67
rafaelc Avatar answered Mar 21 '26 00:03

rafaelc


I am using where with diff

df[['Val_D', 'Val_M']].where(df[['Val_D', 'Val_M']].diff().ne(0))
Out[1136]: 
   Val_D  Val_M
0   10.0  100.0
1    NaN    NaN
2    NaN    NaN
3   22.0    NaN
4    NaN    NaN
5   10.0  240.0
6    NaN    NaN
7    NaN    NaN
df[['Val_D', 'Val_M']]= df[['Val_D', 'Val_M']].where(df[['Val_D', 'Val_M']].diff().ne(0))
like image 23
BENY Avatar answered Mar 20 '26 23:03

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!