Replace various duplicate values with np.nan within multiple columns

Question

I am trying to replace duplicate values across multiple columns in a pandas df. For the df below I have dates and values respective to these dates. All values are the same for each date. I just want to keep the first value for each date and replace the following duplicate values with np.nan. Below is my attempt:

import pandas as pd
import numpy as np

d = ({
    'Date' : ['1/1/18','1/1/18','1/1/18','2/1/18','2/1/18','3/2/18','3/2/18','3/2/18'],                 
    'Val_D' : [10,10,10,22,22,10,10,10],      
    'Val_M' : [100,100,100,100,100,240,240,240],                                   
     })

df = pd.DataFrame(data = d)

df['Date'] = pd.to_datetime(df['Date'], format= '%d/%m/%y')

pd.Series([10,22,100,240]).duplicated()
dup = df.apply(pd.Series.duplicated, axis = 1)
df = df.where(~dup,np.nan)

print(df)

Intended Output:

     Date Val_D Val_M
0  1/1/18    10   100
1  1/1/18            
2  1/1/18            
3  2/1/18    22      
4  2/1/18            
5  3/2/18    10   240
6  3/2/18            
7  3/2/18

rafaelc · Accepted Answer

Well, one way is to simply use diff+ne

s = df[['Val_D', 'Val_M']]
df[['Val_D', 'Val_M']] = s[s.diff().ne(0)].fillna('')

Even though this yields your expected output, the dtypes of your columns become object and you lose your vectorization power with numbers. So I'd suggest you don't do that. Without the last piece fillna(''), you get

    Date        Val_D   Val_M
0   2018-01-01  10.0    100.0
1   2018-01-01  NaN     NaN
2   2018-01-01  NaN     NaN
3   2018-01-02  22.0    NaN
4   2018-01-02  NaN     NaN
5   2018-02-03  10.0    240.0
6   2018-02-03  NaN     NaN
7   2018-02-03  NaN     NaN

The dtypes are float. Now, with the fillna('') piece, you get

    Date        Val_D   Val_M
0   2018-01-01  10      100
1   2018-01-01      
2   2018-01-01      
3   2018-01-02  22  
4   2018-01-02      
5   2018-02-03  10      240
6   2018-02-03      
7   2018-02-03

with object dtypes.

BENY · Answer

I am using where with diff

df[['Val_D', 'Val_M']].where(df[['Val_D', 'Val_M']].diff().ne(0))
Out[1136]: 
   Val_D  Val_M
0   10.0  100.0
1    NaN    NaN
2    NaN    NaN
3   22.0    NaN
4    NaN    NaN
5   10.0  240.0
6    NaN    NaN
7    NaN    NaN
df[['Val_D', 'Val_M']]= df[['Val_D', 'Val_M']].where(df[['Val_D', 'Val_M']].diff().ne(0))

Replace various duplicate values with np.nan within multiple columns

Tags:

python

replace

pandas

duplicates

jonboy

2 Answers

rafaelc

BENY

Recent Activity

Donate For Us

Replace various duplicate values with np.nan within multiple columns

Tags:

python

replace

pandas

duplicates

jonboy

2 Answers

rafaelc

BENY

Related questions

Recent Activity

Donate For Us