I have a situation like this. Multiple data.table "rbinded". <pre class="prettyprint"><code>library(data.table) x <- data.table(id=c(1,2,3,4),dsp=c(5,6,7,8),status=c(FALSE,TRUE,FALSE,TRUE)) y <- data.table(id=c(1,2,3,4),dsp=c(6,6,7,8),status=c(FALSE,FALSE,FALSE,TRUE)) z <- data.table(id=c(1,2,3,4),dsp=c(5,6,9,8),status=c(FALSE,TRUE,FALSE,FALSE)) w <- data.table(id=c(1,2,3,4),dsp=c(5,6,7,NA),status=c(FALSE,TRUE,FALSE,TRUE)) setkey(x,id) setkey(y,id) setkey(z,id) setkey(w,id) Bigdt<-rbind(x,y,z,w) </code></pre> I would like to obtain ONLY the not repeated rows like: <pre class="prettyprint"><code>id dsp status 1 6 FALSE 2 6 FALSE 3 9 FALSE 4 8 FALSE 4 NA TRUE </code></pre> So i tried <pre class="prettyprint"><code>Resultdt<-Bigdt[!duplicated(Bigdt)] </code></pre> but the result: <pre class="prettyprint"><code>id dsp status 1 5 FALSE 2 6 TRUE 3 7 FALSE 4 8 TRUE </code></pre> does not match my espectations. I tried in different methods (as rbind is not mandatory), for example merge, join etc. the data.table package seems potentially the one that contains the solution...apparently. Any ideas?

You can do <pre class="prettyprint"><code>Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL][] id dsp status 1: 1 6 FALSE 2: 2 6 FALSE 3: 3 9 FALSE 4: 4 8 FALSE 5: 4 NA TRUE </code></pre> To see how it works, run just part of the <code>DT[][][][]</code> chain: <ul> <li><code>Bigdt[, .N, by=names(Bigdt)]</code></li> <li><code>Bigdt[, .N, by=names(Bigdt)][N == 1L]</code></li> <li><code>Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL]</code></li> </ul>

You may also try <pre class="prettyprint"><code>Bigdt[!(duplicated(Bigdt)|duplicated(Bigdt, fromLast=TRUE))] # id dsp status #1: 1 6 FALSE #2: 2 6 FALSE #3: 3 9 FALSE #4: 4 8 FALSE #5: 4 NA TRUE </code></pre> Or if we are using <code>.SD</code> <pre class="prettyprint"><code>Bigdt[Bigdt[,!(duplicated(.SD)|duplicated(.SD, fromLast=TRUE))]] </code></pre> Or another option would be grouping by the column names, find the row index with <code>.I</code> and subset the dataset <pre class="prettyprint"><code>Bigdt[Bigdt[, .I[.N==1], by = names(Bigdt)]$V1] </code></pre>

How to identify only "not duplicated" rows

Tags:

r

data.table

I have a situation like this. Multiple data.table "rbinded".

library(data.table)
x <-  data.table(id=c(1,2,3,4),dsp=c(5,6,7,8),status=c(FALSE,TRUE,FALSE,TRUE))
y <-  data.table(id=c(1,2,3,4),dsp=c(6,6,7,8),status=c(FALSE,FALSE,FALSE,TRUE))
z <- data.table(id=c(1,2,3,4),dsp=c(5,6,9,8),status=c(FALSE,TRUE,FALSE,FALSE))
w <- data.table(id=c(1,2,3,4),dsp=c(5,6,7,NA),status=c(FALSE,TRUE,FALSE,TRUE))
setkey(x,id)
setkey(y,id)
setkey(z,id)
setkey(w,id)
Bigdt<-rbind(x,y,z,w)

I would like to obtain ONLY the not repeated rows like:

id  dsp status
1   6   FALSE
2   6   FALSE
3   9   FALSE
4   8   FALSE
4   NA  TRUE

So i tried

Resultdt<-Bigdt[!duplicated(Bigdt)]

but the result:

id  dsp status
1   5   FALSE
2   6   TRUE
3   7   FALSE
4   8   TRUE

does not match my espectations. I tried in different methods (as rbind is not mandatory), for example merge, join etc. the data.table package seems potentially the one that contains the solution...apparently. Any ideas?

534

asked May 27 '16 14:05

Antonello Salis

2 Answers

You can do

Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL][]

   id dsp status
1:  1   6  FALSE
2:  2   6  FALSE
3:  3   9  FALSE
4:  4   8  FALSE
5:  4  NA   TRUE

To see how it works, run just part of the DT[][][][] chain:

Bigdt[, .N, by=names(Bigdt)]
Bigdt[, .N, by=names(Bigdt)][N == 1L]
Bigdt[, .N, by=names(Bigdt)][N == 1L][, N := NULL]

130

answered Sep 27 '22 16:09

Frank

You may also try

Bigdt[!(duplicated(Bigdt)|duplicated(Bigdt, fromLast=TRUE))]
#   id dsp status
#1:  1   6  FALSE
#2:  2   6  FALSE
#3:  3   9  FALSE
#4:  4   8  FALSE
#5:  4  NA   TRUE

Or if we are using .SD

Bigdt[Bigdt[,!(duplicated(.SD)|duplicated(.SD, fromLast=TRUE))]]

Or another option would be grouping by the column names, find the row index with .I and subset the dataset

Bigdt[Bigdt[, .I[.N==1], by = names(Bigdt)]$V1]

answered Sep 27 '22 18:09

akrun

Related questions
                            
                                From of list of strings, identify which are human names and which are not
                            
                                How to create a discrete normal distribution in R?
                            
                                Including ASCII art in R
                            
                                Change point colors and color of frame/ellipse around points
                            
                                What is the difference the zoo object and ts object in R?
                            
                                ggplot legend: position of key relative to labels
                            
                                R: How to get a sum of two distributions?
                            
                                How to upload an image into RStudio Notebook?
                            
                                Add leading 0 with gsub
                            
                                Read data from a multi separated csv file in R
                            
                                Get the longest element of a list
                            
                                Probability of the Union of Three or More Sets
                            
                                ggplot2 + plotly : Axis title disappear
                            
                                Merge two data frames to get alternate rows of each data frame in sequence
                            
                                Specify download folder in RSelenium
                            
                                Plot a simple conversion funnel in ggplot
                            
                                How do I count the number of unique vectors in a list?
                            
                                Why is using list() critical for .dots = setNames() uses in dplyr?
                            
                                R - Print table with columns sums below
                            
                                Make all elemants of a character vector the same length

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With