Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing values from a vector that are not duplicated at least x number of times

Given a vector:

eg.:

a = c(1, 2, 2, 4, 5, 3, 5, 3, 2, 1, 5, 3)

Using a[a%in%a[duplicated(a)]] I can remove values not duplicated. However, it only works for values that are only present once.

How would I go on about removing all values that aren't present in this thrice? (or more, in other situations)

The expected result would be:

2 2 5 3 5 3 2 5 3

with 1 and 4 removed, as they are only present twice and once

like image 489
Ale Avatar asked Dec 10 '22 22:12

Ale


2 Answers

You can do this in one line with the ave function:

a[ave(a, a, FUN=length) >= 3]
# [1] 2 2 5 3 5 3 2 5 3

The call to ave(a, a, FUN=length) returns, for each element a[i] in vector a, the total number of times a[i] appears within a. Then you can subset a, limiting to the indices where the total number of times is 3 or more.

like image 103
josliber Avatar answered Jan 18 '23 13:01

josliber


Reasonably straightforward (longer than using ave but possibly more comprehensible):

x <- c(1,2,2,4,5,3,5,3,2,1,5,3)
tt <- table(x)   ## tabulate
## find relevant values
ttr <- as.numeric(names(tt)[tt>=3])
x[x %in% ttr]  ## subset
like image 33
Ben Bolker Avatar answered Jan 18 '23 14:01

Ben Bolker