I have a <code>data.table</code> (<code>data</code> in the following) with 10 columns (<code>C1, ..., C10</code>) and I want to delete duplicate rows. I accidentally used <code>setkey(data,C1)</code>, so now when I run <code>unique(data)</code> I only get unique rows based on the column <code>C1</code>, while I want to remove a row only if it's identical to another one on all the columns <code>C1, ..., C10</code>. Is there a way to undo the <code>setkey()</code> operation? I found this question but it didn't help to solve my provlem. PS: I can get around the problem by setting all columns in my <code>data.table</code> as keys with <code>setkeyv(data, paste0("C", 1:10))</code>, but this is not at all an elegant/practical solution.

First, you can use <code>setkey(data, NULL)</code> to remove the key. Second, <code>unique.data.table</code> has a <code>by</code> option which will allow you to specify on the fly which columns to use for comparison (regardless of which key is currently set): <pre class="prettyprint"><code>unique(data, by = paste0("C", 1:10)) </code></pre> Third, instead of using <code>setkey</code> for many keys, use <code>setkeyv</code> to pass a <code>character</code> vector: <pre class="prettyprint"><code>setkeyv(data, paste0("C", 1:10)) </code></pre> A thorough reading of <code>?setkey</code> and <code>?unique.data.table</code> can provide some more details.

Undo setkey() on data.table in R

Tags:

r

duplicates

key

data.table

I have a data.table (data in the following) with 10 columns (C1, ..., C10) and I want to delete duplicate rows.

I accidentally used setkey(data,C1), so now when I run unique(data) I only get unique rows based on the column C1, while I want to remove a row only if it's identical to another one on all the columns C1, ..., C10.
Is there a way to undo the setkey() operation? I found this question but it didn't help to solve my provlem.

PS: I can get around the problem by setting all columns in my data.table as keys with setkeyv(data, paste0("C", 1:10)), but this is not at all an elegant/practical solution.

320

asked Jun 04 '16 09:06

hellter

1 Answers

First, you can use setkey(data, NULL) to remove the key.

Second, unique.data.table has a by option which will allow you to specify on the fly which columns to use for comparison (regardless of which key is currently set):

unique(data, by = paste0("C", 1:10))

Third, instead of using setkey for many keys, use setkeyv to pass a character vector:

setkeyv(data, paste0("C", 1:10))

A thorough reading of ?setkey and ?unique.data.table can provide some more details.

123

answered Sep 21 '22 14:09

MichaelChirico

Related questions
                            
                                Dodging points and error bars with ggplot
                            
                                How to end a header 3 box in rmarkdown beamer madrid presentation?
                            
                                NA in clustering functions (kmeans, pam, clara). How to associate clusters to original data?
                            
                                R: ggvis - gray background (as ggplot2)
                            
                                ggplot2 boxplot medians aren't plotting as expected
                            
                                create an empty list to fill it up with lists in R
                            
                                Fit model by group using Data.Table package
                            
                                Control size of figure in Rstudio presentation
                            
                                Shiny - All sub-lists in "choices" must be named?
                            
                                Using filtered datatables in shiny
                            
                                R data.table column names not working within a function
                            
                                issue saving R plot with transparent background
                            
                                Test if variable is empty in R
                            
                                Remove white space between plots and table in grid.arrange
                            
                                How to tell what method is being used by a function call when `methods` fails?
                            
                                dygraph in R multiple plots at once
                            
                                R: strsplit on backslash (\)
                            
                                Label next to selectInput in shiny
                            
                                subset parameter in layers is no longer working with ggplot2 >= 2.0.0
                            
                                data.table WHERE before BY

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With