Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete Redundant columns in R [duplicate]

Tags:

merge

r

I have something similar to this:

date        pgm      in.x     logs       out.y
20130514    na       12       j1         12
20131204    z2       03       j1         03
20130516    a01      04       j0         04
20130628    z1       05       j2         05

I noticed that the in and out values are always the same so I want to delete the out.y column. And I have other columns like this I want to be able to detect any .y columns that match .x columns and delete them after I do the merge.

like image 704
Chayma Atallah Avatar asked Jun 01 '16 09:06

Chayma Atallah


People also ask

How do I remove duplicates from a vector in R?

unique() function in R Language is used to remove duplicated elements/rows from a vector, data frame or array.


1 Answers

If we assume all column redundancies should be removed

no_duplicate <- data_set[!duplicated(as.list(data_set))]

will do the trick.

as.list will convert the data.frame to a list of all its columns, and duplicated will return indices for those columns that have all values as a duplicate of a previously seen column.

This does not directly try to compare .x and .y columns, but has the effect of retaining one copy of each duplicated column, which I assume is the main goal. On the other hand, it will also remove any .x columns that are duplicates of another .x column.


If we want to retain all .x columns, even those that are duplicates, a good solution might be to do filtering before the merge. Assuming you have data_x and data_y that will be merged by column "identifier":

data_y_nonredundant <- data_y[!(as.list(data_y) %in% as.list(data_x) & names(data_y)!="identifier")]
data <- merge(data_x, data_y_nonredundant, by=c("identifier"))
like image 123
Alex A. Avatar answered Nov 01 '22 15:11

Alex A.