R: remove columns based on two column's similarity check

Question

Input

row.no   column2    column3  column4
1        bb         ee       up
2        bb         ee       down
3        bb         ee       up
4        bb         yy       down
5        bb         zz       up

I have a rule to remove row 1 and 2 and 3, as while column2 and column3 for row 1, 2 and 3 are the same, contradictory data (up and down) are found in column 4.

How can I ask R to remove those rows with same name in column2 and column3 but contracting column 3 to result a matrix as follows:

row.no   column2    column3  column4
4        bb         yy       down
5        bb         zz       up

Andrie · Accepted Answer

The functions in package plyr really shine at this type of problem. Here is a solution using two lines of code.

Set up the data (kindly provided by @GavinSimpson)

dat <- structure(list(row.no = 1:5, column2 = structure(c(1L, 1L, 1L, 
1L, 1L), .Label = "bb", class = "factor"), column3 = structure(c(1L, 
1L, 1L, 2L, 3L), .Label = c("ee", "yy", "zz"), class = "factor"), 
    column4 = structure(c(2L, 1L, 2L, 1L, 2L), .Label = c("down", 
    "up"), class = "factor")), .Names = c("row.no", "column2", 
"column3", "column4"), class = "data.frame", row.names = c(NA, 
-5L))

Load the plyr package

library(plyr)

Use ddply to split, analyse and combine dat. The following line of code analyses splits dat into unique combination of (column2 and column3) separately. I then add a column called unique, which calculates the number of unique values of column4 for each set. Finally, use a simple subsetting to return only those lines where unique==1, and drop column 5.

df <- ddply(dat, .(column2, column3), transform, 
    row.no=row.no, unique=length(unique(column4)))
df[df$unique==1, -5]

And the results:

  row.no column2 column3 column4
4      4      bb      yy    down
5      5      bb      zz      up

R: remove columns based on two column's similarity check

Tags:

r

rows

plyr

Catherine

1 Answers

Andrie

Recent Activity

Donate For Us

R: remove columns based on two column's similarity check

Tags:

r

rows

plyr

Catherine

1 Answers

Andrie

Related questions

Recent Activity

Donate For Us