Suppose I have a vector of values, such as: <pre class="prettyprint"><code>A C A B A C C B B C C A A A B B B B C A </code></pre> I would like to create a new vector that, for each element, contains the number of elements since that element was last seen. So, for the vector above, <pre class="prettyprint"><code>NA NA 2 NA 2 4 1 4 1 3 1 7 1 1 6 1 1 1 8 6 </code></pre> (where <code>NA</code> indicates that this is the first time the element has been seen). For example, the first and second A are in position 1 and 3 respectively, a difference of 2; the third and fourth A are in position 4 and 11, a difference of 7, and so on. Is there a pre-built pipe-compatible function that does this? I hacked together this function to demonstrate: <pre class="prettyprint"><code># For reproducibility set.seed(1) # Example vector x = sample(LETTERS[1:3], size = 20, replace = TRUE) compute_lag_counts = function(x, first_time = NA){ # return vector to fill lag_counts = rep(-1, length(x)) # values to match vals = unique(x) # find all positions of all elements in the target vector match_list = grr::matches(vals, x, list = TRUE) # compute the lags, then put them in the appropriate place in the return vector for(i in seq_along(match_list)) lag_counts[x == vals[i]] = c(first_time, diff(sort(match_list[[i]]))) # return vector return(lag_counts) } compute_lag_counts(x) </code></pre> Although it seems to do what it is supposed to do, I'd rather use someone else's efficient, well-tested solution! My searching has turned up empty, which is surprising to me given that it seems like a common task.

Or <pre class="prettyprint"><code>ave(seq.int(x), x, FUN = function(x) c(NA, diff(x))) # [1] NA NA 2 NA 2 4 1 4 1 3 1 7 1 1 6 1 1 1 8 6 </code></pre> We calculate the first <code>diff</code>erence of the indices for each group of <code>x</code>. <hr> A <code>data.table</code> option thanks to @Henrik <pre class="prettyprint"><code>library(data.table) dt = data.table(x) dt[ , d := .I - shift(.I), x] dt </code></pre>

Here's a function that would work <pre class="prettyprint"><code>compute_lag_counts <- function(x) { seqs <- split(seq_along(x), x) unsplit(Map(function(i) c(NA, diff(i)), seqs), x) } compute_lag_counts (x) # [1] NA NA 2 NA 2 4 1 4 1 3 1 7 1 1 6 1 1 1 8 6 </code></pre> Basically you use <code>split()</code> to separate the indexes where values appear by each unique value in your vector. Then we use the different between the index where they appear to calculate the distance to the previous value. Then we use <code>unstack</code> to put those values back in the original order.

Count of number of elements between distinct elements in vector

Tags:

r

vector

difference

grouping

Suppose I have a vector of values, such as:

A C A B A C C B B C C A A A B B B B C A

I would like to create a new vector that, for each element, contains the number of elements since that element was last seen. So, for the vector above,

NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

(where NA indicates that this is the first time the element has been seen).

For example, the first and second A are in position 1 and 3 respectively, a difference of 2; the third and fourth A are in position 4 and 11, a difference of 7, and so on.

Is there a pre-built pipe-compatible function that does this?

I hacked together this function to demonstrate:

# For reproducibility
set.seed(1)

# Example vector
x = sample(LETTERS[1:3], size = 20, replace = TRUE)


compute_lag_counts = function(x, first_time = NA){
  # return vector to fill
  lag_counts = rep(-1, length(x))
  # values to match
  vals = unique(x)
  # find all positions of all elements in the target vector
  match_list = grr::matches(vals, x, list = TRUE)
  # compute the lags, then put them in the appropriate place in the return vector
  for(i in seq_along(match_list))
    lag_counts[x == vals[i]] = c(first_time, diff(sort(match_list[[i]])))
  
  # return vector
  return(lag_counts)
}

compute_lag_counts(x)

Although it seems to do what it is supposed to do, I'd rather use someone else's efficient, well-tested solution! My searching has turned up empty, which is surprising to me given that it seems like a common task.

315

asked Jul 06 '20 20:07

richarddmorey

2 Answers

ave(seq.int(x), x, FUN = function(x) c(NA, diff(x)))
#  [1] NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

We calculate the first difference of the indices for each group of x.

A data.table option thanks to @Henrik

library(data.table)
dt = data.table(x)
dt[ , d := .I - shift(.I), x]
dt

answered Oct 26 '22 14:10

markus

Here's a function that would work

compute_lag_counts <- function(x) {
  seqs <- split(seq_along(x), x)
  unsplit(Map(function(i) c(NA, diff(i)), seqs), x)
}

compute_lag_counts (x)
# [1] NA NA  2 NA  2  4  1  4  1  3  1  7  1  1  6  1  1  1  8  6

Basically you use split() to separate the indexes where values appear by each unique value in your vector. Then we use the different between the index where they appear to calculate the distance to the previous value. Then we use unstack to put those values back in the original order.

answered Oct 26 '22 13:10

MrFlick

Related questions
                            
                                Count matching elements by row between two data tables in R
                            
                                Tooltip in shiny UI for help text
                            
                                How to standardize a data frame which contains both numeric and factor variables
                            
                                Directlabels package-- labels do not fit in plot area
                            
                                R data.table Multiple Conditions Join
                            
                                Installing nloptr on Linux - fatal error: nlopt.h: No such file or directory
                            
                                R - Determine if a variable is a string
                            
                                R - calculating the average value of a dataframe column from the top row to bottom row
                            
                                linear model with `lm`: how to get prediction variance of sum of predicted values
                            
                                ggplot add ticks to each plot in a facet_wrap
                            
                                non standard file "data-raw" note on building/checking a package in R
                            
                                How to use angle in geom_label?
                            
                                How to let geom_text inherit theme specifications? (ggplot2)
                            
                                Shade geom_density over interval on x-axis if there is no y variable
                            
                                Stop geom_density_ridges from showing non-existent tail values
                            
                                xaringan: Changing code background for specific chunks
                            
                                Error in library(dplyr) : there is no package called ‘dplyr’
                            
                                Could not find function "%<>%" with dplyr loaded
                            
                                How to format data for plotly sunburst diagram
                            
                                How to get the first row of a dataframe with names when there is only one column? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With