Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Undo setkey() on data.table in R

I have a data.table (data in the following) with 10 columns (C1, ..., C10) and I want to delete duplicate rows.

I accidentally used setkey(data,C1), so now when I run unique(data) I only get unique rows based on the column C1, while I want to remove a row only if it's identical to another one on all the columns C1, ..., C10.
Is there a way to undo the setkey() operation? I found this question but it didn't help to solve my provlem.

PS: I can get around the problem by setting all columns in my data.table as keys with setkeyv(data, paste0("C", 1:10)), but this is not at all an elegant/practical solution.

like image 320
hellter Avatar asked Jun 04 '16 09:06

hellter


People also ask

What does setkey mean in R?

Description. setkey sorts a data. table and marks it as sorted with an attribute sorted . The sorted columns are the key. The key can be any number of columns.

How do I merge two data tables in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.


1 Answers

First, you can use setkey(data, NULL) to remove the key.

Second, unique.data.table has a by option which will allow you to specify on the fly which columns to use for comparison (regardless of which key is currently set):

unique(data, by = paste0("C", 1:10))

Third, instead of using setkey for many keys, use setkeyv to pass a character vector:

setkeyv(data, paste0("C", 1:10))

A thorough reading of ?setkey and ?unique.data.table can provide some more details.

like image 123
MichaelChirico Avatar answered Sep 21 '22 14:09

MichaelChirico