I have a data frame df, and I am trying to subset all rows that have a value in column B
occur more than once in the dataset.
I tried using table to do it, but am having trouble subsetting from the table:
t<-table(df$B)
Then I try subsetting it using:
subset(df, table(df$B)>1)
And I get the error
"Error in x[subset & !is.na(subset)] : object of type 'closure' is not subsettable"
How can I subset my data frame using table counts?
We can also use the following syntax to find how frequently each unique value occurs in the ‘assists’ column: The value 9 occurs 3 times. The value 7 occurs 2 times. The value 5 occurs 1 time. And so on. Next Interpolation vs. Extrapolation: What’s the Difference?
Return multiple columns using Pandas apply () method. Objects passed to the pandas.apply () are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function.
Considering certain columns is optional. Only consider certain columns for identifying duplicates, by default use all of the columns. Determines which duplicates (if any) to mark. first : Mark duplicates as True except for the first occurrence. last : Mark duplicates as True except for the last occurrence.
By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument. Syntax: DataFrame.apply (func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args= (), **kwds)
Here is a dplyr
solution (using mrFlick's data.frame)
library(dplyr)
newd <- dd %>% group_by(b) %>% filter(n()>1) #
newd
# a b
# 1 1 1
# 2 2 1
# 3 5 4
# 4 6 4
# 5 7 4
# 6 9 6
# 7 10 6
Or, using data.table
setDT(dd)[,if(.N >1) .SD,by=b]
Or using base R
dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]
May I suggest an alternative, faster way to do this with data.table
?
require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B
(or) you can couple .I
(another special variable - see ?data.table
) which gives the corresponding row number in df
, along with .N
as follows:
setDT(df)[df[, .I[.N > 1L], by=B]$V1]
(or) have a look at @mnel's another for another variation (using yet another special variable .SD
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With