I'm am doing some coding in R. I would want to show the rows that have duplicates for columns ID and NAME but have different values for AGE. For example I have this table: <pre class="prettyprint"><code>ID | NAME | AGE 111| Mark| 22 222| Anne| 21 333| Chery| 30 444| Megan| 16 555| Charles| 37 111| Mark| 23 222| Anne| 22 333| Chery| 30 111| Mark| 22 </code></pre> As of now I have this code: <pre class="prettyprint"><code>readfile <- read.csv(file='/home/user/shane/names.csv') dat <- data.frame(ID=c(readfile$ID),NAME=c(readfile$NAME),AGE=c(readfile$AGE)) nam <- duplicated(dat[,c('ID','NAME)]) | duplicated(dat[,c('ID','NAME], fromLast = TRUE) readfile[nam,] </code></pre> The output looks like this: <pre class="prettyprint"><code>ID | NAME | AGE 111| Mark| 22 222| Anne| 21 333| Chery| 30 111| Mark| 23 222| Anne| 22 333| Chery| 30 111| Mark| 22 </code></pre> I would want the output to be: <pre class="prettyprint"><code>ID | NAME | AGE 111| Mark| 22 222| Anne| 21 111| Mark| 23 222| Anne| 22 111| Mark| 22 </code></pre> I would want to remove the columns with the ID = 333 as they have the same value in Age. would anyone have a suggestion?

I just tweaked your code :) <pre class="prettyprint lang-r prettyprint-override"><code>library(plyr) dat1 <- ddply(dat, .(ID, NAME, AGE), nrow) dat2 <- merge(dat1, dat, by=c("ID", "NAME", "AGE")) dat3 <- dat2[!(!duplicated(dat2[, 1:2], fromLast=T) & !duplicated(dat2[, 1:2])),] dat3[dat3$ID %in% dat3[dat3$V1 == 1, 1], 1:3] </code></pre> Output is: <pre class="prettyprint"><code> ID NAME AGE 1 111 Mark 22 2 111 Mark 22 3 111 Mark 23 4 222 Anne 21 5 222 Anne 22 </code></pre> Sample data: <pre class="prettyprint"><code>dat <- data.frame(ID=c(111,222,333,444,555,111,222,333,111), NAME=c('Mark','Anne','Chery','Megan','Charles','Mark','Anne','Chery','Mark'), AGE=c(22,21,30,16,37,23,22,30,22)) # ID NAME AGE #1 111 Mark 22 #2 222 Anne 21 #3 333 Chery 30 #4 444 Megan 16 #5 555 Charles 37 #6 111 Mark 23 #7 222 Anne 22 #8 333 Chery 30 #9 111 Mark 22 </code></pre> Update: Corrected formatting for better reading

A <code>dplyr</code> solution: <pre class="prettyprint"><code>library(dplyr) dat %>% group_by(ID, NAME) %>% filter(n() > 1, sum(duplicated(AGE)) == 0) %>% ungroup() # A tibble: 4 x 3 ID NAME AGE <dbl> <fctr> <dbl> 1 111 Mark 22 2 222 Anne 21 3 111 Mark 23 4 222 Anne 22 </code></pre> I used the data kindly provided by @Prem.

Extract rows that have duplicates for certain column but are unique in another column

I'm am doing some coding in R. I would want to show the rows that have duplicates for columns ID and NAME but have different values for AGE.

For example I have this table:

ID |    NAME | AGE
111|     Mark| 22
222|     Anne| 21
333|    Chery| 30
444|    Megan| 16
555|  Charles| 37
111|     Mark| 23
222|     Anne| 22
333|    Chery| 30
111|     Mark| 22

As of now I have this code:

readfile <- read.csv(file='/home/user/shane/names.csv')
dat <- data.frame(ID=c(readfile$ID),NAME=c(readfile$NAME),AGE=c(readfile$AGE))
nam <- duplicated(dat[,c('ID','NAME)]) | duplicated(dat[,c('ID','NAME], fromLast = TRUE)
readfile[nam,]

The output looks like this:

ID |    NAME | AGE
111|     Mark| 22
222|     Anne| 21
333|    Chery| 30
111|     Mark| 23
222|     Anne| 22
333|    Chery| 30
111|     Mark| 22

I would want the output to be:

ID |    NAME | AGE
111|     Mark| 22
222|     Anne| 21
111|     Mark| 23
222|     Anne| 22
111|     Mark| 22

I would want to remove the columns with the ID = 333 as they have the same value in Age. would anyone have a suggestion?

How can I get unique values from a column with repeated values in SQL?

The SELECT DISTINCT statement is used to return only distinct (different) values. Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.

I just tweaked your code :)

library(plyr) 

dat1 <- ddply(dat, .(ID, NAME, AGE), nrow) 
dat2 <- merge(dat1, dat, by=c("ID", "NAME", "AGE")) 
dat3 <- dat2[!(!duplicated(dat2[, 1:2], fromLast=T) & !duplicated(dat2[, 1:2])),] 
dat3[dat3$ID %in% dat3[dat3$V1 == 1, 1], 1:3]

Output is:

   ID NAME AGE
1 111 Mark  22
2 111 Mark  22
3 111 Mark  23
4 222 Anne  21
5 222 Anne  22

Sample data:

dat <- data.frame(ID=c(111,222,333,444,555,111,222,333,111), 
                  NAME=c('Mark','Anne','Chery','Megan','Charles','Mark','Anne','Chery','Mark'), 
                  AGE=c(22,21,30,16,37,23,22,30,22)) 
#   ID    NAME AGE
#1 111    Mark  22
#2 222    Anne  21
#3 333   Chery  30
#4 444   Megan  16
#5 555 Charles  37
#6 111    Mark  23
#7 222    Anne  22
#8 333   Chery  30
#9 111    Mark  22

Update: Corrected formatting for better reading

A dplyr solution:

library(dplyr)
dat %>%
    group_by(ID, NAME) %>%
    filter(n() > 1, sum(duplicated(AGE)) == 0) %>%
    ungroup()
# A tibble: 4 x 3
     ID   NAME   AGE
  <dbl> <fctr> <dbl>
1   111   Mark    22
2   222   Anne    21
3   111   Mark    23
4   222   Anne    22

I used the data kindly provided by @Prem.

Extract rows that have duplicates for certain column but are unique in another column

Tags:

r

duplicates

unique

shane1717

People also ask

2 Answers

1.618

mt1022

Recent Activity

Donate For Us

Extract rows that have duplicates for certain column but are unique in another column

Tags:

r

duplicates

unique

shane1717

People also ask

2 Answers

1.618

mt1022

Related questions

Recent Activity

Donate For Us