Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count of number of elements between distinct elements in vector

Suppose I have a vector of values, such as:

A C A B A C C B B C C A A A B B B B C A

I would like to create a new vector that, for each element, contains the number of elements since that element was last seen. So, for the vector above,

NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

(where NA indicates that this is the first time the element has been seen).

For example, the first and second A are in position 1 and 3 respectively, a difference of 2; the third and fourth A are in position 4 and 11, a difference of 7, and so on.

Is there a pre-built pipe-compatible function that does this?

I hacked together this function to demonstrate:

# For reproducibility
set.seed(1)

# Example vector
x = sample(LETTERS[1:3], size = 20, replace = TRUE)


compute_lag_counts = function(x, first_time = NA){
  # return vector to fill
  lag_counts = rep(-1, length(x))
  # values to match
  vals = unique(x)
  # find all positions of all elements in the target vector
  match_list = grr::matches(vals, x, list = TRUE)
  # compute the lags, then put them in the appropriate place in the return vector
  for(i in seq_along(match_list))
    lag_counts[x == vals[i]] = c(first_time, diff(sort(match_list[[i]])))
  
  # return vector
  return(lag_counts)
}

compute_lag_counts(x)

Although it seems to do what it is supposed to do, I'd rather use someone else's efficient, well-tested solution! My searching has turned up empty, which is surprising to me given that it seems like a common task.

like image 315
richarddmorey Avatar asked Jul 06 '20 20:07

richarddmorey


People also ask

How to get the Count of an element in a vector?

The standard solution to get the count of an element in a vector is using the std::count function. It returns the total number of elements in the specified range that is equal to the target, as shown below: 2. Using std::count_if

How to get the number of distinct elements present in array?

Insert all the elements into the set S one by one. 2. Store the total size s of the set using set::size (). 3.The total size s is the number of distinct elements present in the array.

How to count distinct elements in O (n) time?

We can Use Sorting to solve the problem in O (nLogn) time. The idea is simple, first sort the array so that all occurrences of every element become consecutive. Once the occurrences become consecutive, we can traverse the sorted array and count distinct elements in O (n) time.

How to count unique values in a vector?

Here's one simple way to count the unique values in a vector: using Size = ptrdiff_t; template< class Item > auto n_unique_items( vector<Item> const& v ) -> Size { return set<Item>{ v.begin(), v.end() }.size(); } Share Improve this answer Follow edited Nov 28 '16 at 18:28 answered Nov 28 '16 at 18:23 Cheers and hth.


2 Answers

Or

ave(seq.int(x), x, FUN = function(x) c(NA, diff(x)))
#  [1] NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

We calculate the first difference of the indices for each group of x.


A data.table option thanks to @Henrik

library(data.table)
dt = data.table(x)
dt[ , d := .I - shift(.I), x]
dt
like image 56
markus Avatar answered Oct 26 '22 14:10

markus


Here's a function that would work

compute_lag_counts <- function(x) {
  seqs <- split(seq_along(x), x)
  unsplit(Map(function(i) c(NA, diff(i)), seqs), x)
}

compute_lag_counts (x)
# [1] NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

Basically you use split() to separate the indexes where values appear by each unique value in your vector. Then we use the different between the index where they appear to calculate the distance to the previous value. Then we use unstack to put those values back in the original order.

like image 28
MrFlick Avatar answered Oct 26 '22 13:10

MrFlick