Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove duplicated (by name) column in data.tables in R?

Tags:

r

data.table

While reading a data set using fread, I've noticed that sometimes I'm getting duplicated column names, for example (fread doesn't have check.names argument)

> data.table( x = 1, x = 2)
   x x
1: 1 2

The question is: is there any way to remove 1 of 2 columns if they have the same name?

like image 217
Marcin Kosiński Avatar asked Mar 16 '15 21:03

Marcin Kosiński


People also ask

How do I remove duplicates in a column in R?

Use the unique() function to remove duplicates from the selected columns of the R data frame.

How do I get rid of doubles in R?

Remove duplicate rows in a data frameThe function distinct() [dplyr package] can be used to keep only unique/distinct rows from a data frame. If there are duplicate rows, only the first row is preserved. It's an efficient version of the R base function unique() .

Can you have duplicate column names in R?

Duplicate column names are allowed, but you need to use check. names = FALSE for data. frame to generate such a data frame. However, not all operations on data frames will preserve duplicated column names: for example matrix-like subsetting will force column names in the result to be unique.


1 Answers

.SDcols approaches would return a copy of the columns you're selecting. Instead just remove those duplicated columns using :=, by reference.

dt[, which(duplicated(names(dt))) := NULL]
#    x
# 1: 1
like image 134
Arun Avatar answered Sep 22 '22 12:09

Arun