I have a data frame df, and I am trying to subset all rows that have a value in column <code>B</code> occur more than once in the dataset. I tried using table to do it, but am having trouble subsetting from the table: <pre class="prettyprint"><code>t<-table(df$B) </code></pre> Then I try subsetting it using: <pre class="prettyprint"><code>subset(df, table(df$B)>1) </code></pre> And I get the error <blockquote> "Error in x[subset & !is.na(subset)] : object of type 'closure' is not subsettable" </blockquote> How can I subset my data frame using table counts?

Here is a <code>dplyr</code> solution (using mrFlick's data.frame) <pre class="prettyprint"><code>library(dplyr) newd <- dd %>% group_by(b) %>% filter(n()>1) # newd # a b # 1 1 1 # 2 2 1 # 3 5 4 # 4 6 4 # 5 7 4 # 6 9 6 # 7 10 6 </code></pre> Or, using data.table <pre class="prettyprint"><code>setDT(dd)[,if(.N >1) .SD,by=b] </code></pre> Or using base R <pre class="prettyprint"><code>dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),] </code></pre>

May I suggest an alternative, faster way to do this with <code>data.table</code>? <pre class="prettyprint"><code>require(data.table) ## 1.9.2 setDT(df)[, .N, by=B][N > 1L]$B </code></pre> (or) you can couple <code>.I</code> (another special variable - see <code>?data.table</code>) which gives the corresponding row number in <code>df</code>, along with <code>.N</code> as follows: <pre class="prettyprint"><code>setDT(df)[df[, .I[.N > 1L], by=B]$V1] </code></pre> (or) have a look at @mnel's another for another variation (using yet another special variable <code>.SD</code>).

Return df with a columns values that occur more than once [duplicate]

Tags:

dataframe

r

subset

I have a data frame df, and I am trying to subset all rows that have a value in column B occur more than once in the dataset.

I tried using table to do it, but am having trouble subsetting from the table:

t<-table(df$B)

Then I try subsetting it using:

subset(df, table(df$B)>1)

And I get the error

"Error in x[subset & !is.na(subset)] : object of type 'closure' is not subsettable"

How can I subset my data frame using table counts?

489

asked Jul 01 '14 05:07

Chris Robles

2 Answers

Here is a dplyr solution (using mrFlick's data.frame)

library(dplyr)
newd <-  dd %>% group_by(b) %>% filter(n()>1) #
newd
#    a b 
# 1  1 1 
# 2  2 1 
# 3  5 4 
# 4  6 4 
# 5  7 4 
# 6  9 6 
# 7 10 6

Or, using data.table

setDT(dd)[,if(.N >1) .SD,by=b]

Or using base R

dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]

answered Oct 31 '22 03:10

mnel

May I suggest an alternative, faster way to do this with data.table?

require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B

(or) you can couple .I (another special variable - see ?data.table) which gives the corresponding row number in df, along with .N as follows:

setDT(df)[df[, .I[.N > 1L], by=B]$V1]

(or) have a look at @mnel's another for another variation (using yet another special variable .SD).

answered Oct 31 '22 02:10

Mike.Gahan

Related questions
                            
                                Get continent name from country name in R
                            
                                How can I read and parse the contents of a webpage in R
                            
                                Increase the api limit in ggmap's geocode function (in R)
                            
                                .Rmd files open as completely empty
                            
                                Display Values in R Plot
                            
                                Detect the number of cores on windows
                            
                                Delete columns where all values are 0
                            
                                Running functions in R multiple times during benchmarking
                            
                                How to escape a question mark in R?
                            
                                R remove last word from string
                            
                                R tm In mclapply(content(x), FUN, ...) : all scheduled cores encountered errors in user code
                            
                                Use grepl to search either of multiple substrings in a text [duplicate]
                            
                                Nice looking five sets Venn diagrams
                            
                                Use gsub remove all string before first white space in R
                            
                                Indexing the elements of a matrix in R
                            
                                Shiny selectInput very slow on larger data (~15,000 entries) in browser
                            
                                Regular expression to match ALL currency symbols?
                            
                                Use stat_summary to annotate plot with number of observations
                            
                                Reuse ggplot layers in multiple plots
                            
                                Trying to merge multiple csv files in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With