I have a data.table
(data
in the following) with 10 columns (C1, ..., C10
) and I want to delete duplicate rows.
I accidentally used setkey(data,C1)
, so now when I run unique(data)
I only get unique rows based on the column C1
, while I want to remove a row only if it's identical to another one on all the columns C1, ..., C10
.
Is there a way to undo the setkey()
operation? I found this question but it didn't help to solve my provlem.
PS: I can get around the problem by setting all columns in my data.table
as keys with setkeyv(data, paste0("C", 1:10))
, but this is not at all an elegant/practical solution.
Description. setkey sorts a data. table and marks it as sorted with an attribute sorted . The sorted columns are the key. The key can be any number of columns.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
First, you can use setkey(data, NULL)
to remove the key.
Second, unique.data.table
has a by
option which will allow you to specify on the fly which columns to use for comparison (regardless of which key is currently set):
unique(data, by = paste0("C", 1:10))
Third, instead of using setkey
for many keys, use setkeyv
to pass a character
vector:
setkeyv(data, paste0("C", 1:10))
A thorough reading of ?setkey
and ?unique.data.table
can provide some more details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With