I have something similar to this:
date pgm in.x logs out.y
20130514 na 12 j1 12
20131204 z2 03 j1 03
20130516 a01 04 j0 04
20130628 z1 05 j2 05
I noticed that the in and out values are always the same so I want to delete the out.y column. And I have other columns like this I want to be able to detect any .y columns that match .x columns and delete them after I do the merge.
unique() function in R Language is used to remove duplicated elements/rows from a vector, data frame or array.
If we assume all column redundancies should be removed
no_duplicate <- data_set[!duplicated(as.list(data_set))]
will do the trick.
as.list
will convert the data.frame to a list of all its columns, and duplicated
will return indices for those columns that have all values as a duplicate of a previously seen column.
This does not directly try to compare .x and .y columns, but has the effect of retaining one copy of each duplicated column, which I assume is the main goal. On the other hand, it will also remove any .x columns that are duplicates of another .x column.
If we want to retain all .x columns, even those that are duplicates, a good solution might be to do filtering before the merge. Assuming you have data_x
and data_y
that will be merged by column "identifier":
data_y_nonredundant <- data_y[!(as.list(data_y) %in% as.list(data_x) & names(data_y)!="identifier")]
data <- merge(data_x, data_y_nonredundant, by=c("identifier"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With