I have a situation like this. Multiple data.table "rbinded".
library(data.table)
x <- data.table(id=c(1,2,3,4),dsp=c(5,6,7,8),status=c(FALSE,TRUE,FALSE,TRUE))
y <- data.table(id=c(1,2,3,4),dsp=c(6,6,7,8),status=c(FALSE,FALSE,FALSE,TRUE))
z <- data.table(id=c(1,2,3,4),dsp=c(5,6,9,8),status=c(FALSE,TRUE,FALSE,FALSE))
w <- data.table(id=c(1,2,3,4),dsp=c(5,6,7,NA),status=c(FALSE,TRUE,FALSE,TRUE))
setkey(x,id)
setkey(y,id)
setkey(z,id)
setkey(w,id)
Bigdt<-rbind(x,y,z,w)
I would like to obtain ONLY the not repeated rows like:
id dsp status
1 6 FALSE
2 6 FALSE
3 9 FALSE
4 8 FALSE
4 NA TRUE
So i tried
Resultdt<-Bigdt[!duplicated(Bigdt)]
but the result:
id dsp status
1 5 FALSE
2 6 TRUE
3 7 FALSE
4 8 TRUE
does not match my espectations. I tried in different methods (as rbind is not mandatory), for example merge, join etc. the data.table package seems potentially the one that contains the solution...apparently. Any ideas?
If you want the query to return only unique rows, use the keyword DISTINCT after SELECT . DISTINCT can be used to fetch unique rows from one or more columns. You need to list the columns after the DISTINCT keyword.
To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates. To highlight unique or duplicate values, use the Conditional Formatting command in the Style group on the Home tab.
You can do
Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL][]
id dsp status
1: 1 6 FALSE
2: 2 6 FALSE
3: 3 9 FALSE
4: 4 8 FALSE
5: 4 NA TRUE
To see how it works, run just part of the DT[][][][]
chain:
Bigdt[, .N, by=names(Bigdt)]
Bigdt[, .N, by=names(Bigdt)][N == 1L]
Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL]
You may also try
Bigdt[!(duplicated(Bigdt)|duplicated(Bigdt, fromLast=TRUE))]
# id dsp status
#1: 1 6 FALSE
#2: 2 6 FALSE
#3: 3 9 FALSE
#4: 4 8 FALSE
#5: 4 NA TRUE
Or if we are using .SD
Bigdt[Bigdt[,!(duplicated(.SD)|duplicated(.SD, fromLast=TRUE))]]
Or another option would be grouping by the column names, find the row index with .I
and subset the dataset
Bigdt[Bigdt[, .I[.N==1], by = names(Bigdt)]$V1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With