Suppose I have a vector of values, such as:
A C A B A C C B B C C A A A B B B B C A
I would like to create a new vector that, for each element, contains the number of elements since that element was last seen. So, for the vector above,
NA NA 2 NA 2 4 1 4 1 3 1 7 1 1 6 1 1 1 8 6
(where NA
indicates that this is the first time the element has been seen).
For example, the first and second A are in position 1 and 3 respectively, a difference of 2; the third and fourth A are in position 4 and 11, a difference of 7, and so on.
Is there a pre-built pipe-compatible function that does this?
I hacked together this function to demonstrate:
# For reproducibility
set.seed(1)
# Example vector
x = sample(LETTERS[1:3], size = 20, replace = TRUE)
compute_lag_counts = function(x, first_time = NA){
# return vector to fill
lag_counts = rep(-1, length(x))
# values to match
vals = unique(x)
# find all positions of all elements in the target vector
match_list = grr::matches(vals, x, list = TRUE)
# compute the lags, then put them in the appropriate place in the return vector
for(i in seq_along(match_list))
lag_counts[x == vals[i]] = c(first_time, diff(sort(match_list[[i]])))
# return vector
return(lag_counts)
}
compute_lag_counts(x)
Although it seems to do what it is supposed to do, I'd rather use someone else's efficient, well-tested solution! My searching has turned up empty, which is surprising to me given that it seems like a common task.
The standard solution to get the count of an element in a vector is using the std::count function. It returns the total number of elements in the specified range that is equal to the target, as shown below: 2. Using std::count_if
Insert all the elements into the set S one by one. 2. Store the total size s of the set using set::size (). 3.The total size s is the number of distinct elements present in the array.
We can Use Sorting to solve the problem in O (nLogn) time. The idea is simple, first sort the array so that all occurrences of every element become consecutive. Once the occurrences become consecutive, we can traverse the sorted array and count distinct elements in O (n) time.
Here's one simple way to count the unique values in a vector: using Size = ptrdiff_t; template< class Item > auto n_unique_items( vector<Item> const& v ) -> Size { return set<Item>{ v.begin(), v.end() }.size(); } Share Improve this answer Follow edited Nov 28 '16 at 18:28 answered Nov 28 '16 at 18:23 Cheers and hth.
Or
ave(seq.int(x), x, FUN = function(x) c(NA, diff(x)))
# [1] NA NA 2 NA 2 4 1 4 1 3 1 7 1 1 6 1 1 1 8 6
We calculate the first diff
erence of the indices for each group of x
.
A data.table
option thanks to @Henrik
library(data.table)
dt = data.table(x)
dt[ , d := .I - shift(.I), x]
dt
Here's a function that would work
compute_lag_counts <- function(x) {
seqs <- split(seq_along(x), x)
unsplit(Map(function(i) c(NA, diff(i)), seqs), x)
}
compute_lag_counts (x)
# [1] NA NA 2 NA 2 4 1 4 1 3 1 7 1 1 6 1 1 1 8 6
Basically you use split()
to separate the indexes where values appear by each unique value in your vector. Then we use the different between the index where they appear to calculate the distance to the previous value. Then we use unstack
to put those values back in the original order.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With