Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filling specific duplicated values within the rows of a dataframe with NAs

Tags:

r

duplicates

For each row of my dataframe, I am currently trying to select all the duplicated values equal to 4 in order to set them "equal" to NA.

My dataframe is like this:

dat <- read.table(text = "

   1  1  1  2  2  4  4  4  
   1  2  1  1  4  4  4  4", 

header=FALSE)

What I need to obtain is:

   1  1  1  2  2  4   NA  NA
   1  2  1  1  4  NA  NA  NA 

I have found information on how to eliminate duplicated rows or columns, but I really do not know how to proceed here.. many thanks for any help

like image 441
Stefano Lombardi Avatar asked Dec 20 '22 10:12

Stefano Lombardi


2 Answers

Sometimes you will want to avoid apply because it destroys the multi-class feature of dataframe objects. This is a by approach:

> do.call(rbind, by(dat, rownames(dat), 
        function(line) {line[ duplicated(unlist(line)) & line==4 ] <- NA; line} ) )
  V1 V2 V3 V4 V5 V6 V7 V8
1  1  1  1  2  2  4 NA NA
2  1  2  1  1  4 NA NA NA
like image 198
IRTFM Avatar answered Dec 28 '22 09:12

IRTFM


which and apply are helpful here.

> dat <- t(apply(dat, 1, function(X) {X[which(X==4)][-1] <- NA ; X})) 
> dat
[1,]  1  1  1  2  2  4 NA NA
[2,]  1  2  1  1  4 NA NA NA

But there's probably a way around having to use the transpose (t) function here, can anyone help me out?

like image 44
Señor O Avatar answered Dec 28 '22 09:12

Señor O