My data : <pre class="prettyprint"><code>data <- c(1,5,11,15,24,31,32,65) </code></pre> There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be : <pre class="prettyprint"><code>data <- c(1,5,11,15,24,31.5,65) </code></pre> It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance : <pre class="prettyprint"><code>data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140) </code></pre>

Here is another idea that creates an id via <code>cumsum(c(TRUE, diff(a) > 1))</code>, where <code>1</code> shows the gap threshold, i.e. <pre class="prettyprint"><code>#our group variable grp <- cumsum(c(TRUE, diff(a) > 1)) #keep only groups with length 1 (i.e. with no neighbor) i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)] #Find the mean of the groups with more than 1 rows, i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1]))) #Concatenate the above 2 (eliminating NAs from i2) to get final result c(i1, i2[!is.na(i2)]) #[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5 </code></pre> You can also wrap it in a function. I left the gap as a parameter so you can adjust, <pre class="prettyprint"><code>get_vec <- function(x, gap) { grp <- cumsum(c(TRUE, diff(x) > gap)) i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)] i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1]))) return(c(i1, i2[!is.na(i2)])) } get_vec(a, 1) #[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5 get_vec(a_2, 1) #[1] 1.0 5.0 11.0 15.0 24.0 65.0 140.0 31.5 100.0 </code></pre> DATA: <pre class="prettyprint"><code>a <- c(1,5,11,15,24,31,32,65) a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140) </code></pre>

Here is my solution, which uses run-length encoding to identify groups: <pre class="prettyprint"><code>foo <- function(x) { y <- x - seq_along(x) #normalize to zero differences in groups ind <- rle(y) #run-length encoding ind$values <- ind$lengths != 1 #to find groups ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids ind <- inverse.rle(ind) xnew <- x xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups } foo(data) #[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 foo(data_2) #[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 100.0 140.0 data_3 <- c(1, 2, 4, 1, 2) foo(data_3) #[1] 1.5 4.0 1.5 </code></pre> I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ <code>for</code> loop in Rcpp.

I have a data.table based solution, same could be translated into dplyr I guess: <pre class="prettyprint"><code>library(data.table) df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140)) df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)] df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)] df[,neigh_seq := rleid(neighbours)] unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq]) neigh_seq V1 1: 1 1.0 2: 1 5.0 3: 1 11.0 4: 1 15.0 5: 1 24.0 6: 2 31.5 7: 3 65.0 8: 4 100.0 9: 5 140.0 </code></pre> What it does : first line set neigbours to 1 if the difference with following number is 1 <pre class="prettyprint"><code> 1: 1 0 2: 5 0 3: 11 0 4: 15 0 5: 24 0 6: 31 0 7: 32 1 8: 65 0 9: 99 0 10: 100 1 11: 101 1 12: 140 0 </code></pre> I wanr to group so that <code>neighbour</code> variable is 1 for all neigbours. I need to add 1 to each end of each groups: <pre class="prettyprint"><code>df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)] data2 neighbours 1: 1 0 2: 5 0 3: 11 0 4: 15 0 5: 24 0 6: 31 1 7: 32 1 8: 65 0 9: 99 1 10: 100 1 11: 101 1 12: 140 0 </code></pre> Then after I just do a grouping on changing <code>neighbour</code> value, and set the value to mean if they are neihbours <pre class="prettyprint"><code>df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)] rleid V1 1: 1 1.0 2: 1 5.0 3: 1 11.0 4: 1 15.0 5: 1 24.0 6: 2 31.5 7: 2 31.5 8: 3 65.0 9: 4 100.0 10: 4 100.0 11: 4 100.0 12: 5 140.0 </code></pre> and take the unique values. And voila.

Average neighbours inside a vector

Tags:

r

vector

difference

neighbours

My data :

data <- c(1,5,11,15,24,31,32,65)

There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be :

data <- c(1,5,11,15,24,31.5,65)

It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance :

data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140)

713

asked Dec 10 '18 11:12

Loulou

3 Answers

Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1)), where 1 shows the gap threshold, i.e.

#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))

#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)] 

#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))

#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1]  1.0  5.0 11.0 15.0 24.0 65.0 31.5

You can also wrap it in a function. I left the gap as a parameter so you can adjust,

get_vec <- function(x, gap) {
    grp <- cumsum(c(TRUE, diff(x) > gap))
    i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
    i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
    return(c(i1, i2[!is.na(i2)]))
}

get_vec(a, 1)
#[1]  1.0  5.0 11.0 15.0 24.0 65.0 31.5

get_vec(a_2, 1)
#[1]   1.0   5.0  11.0  15.0  24.0  65.0 140.0  31.5 100.0

DATA:

a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)

199

answered Sep 20 '22 23:09

Sotos

Here is my solution, which uses run-length encoding to identify groups:

foo <- function(x) {
  y <- x - seq_along(x) #normalize to zero differences in groups
  ind <- rle(y) #run-length encoding
  ind$values <- ind$lengths != 1 #to find groups
  ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
  ind <- inverse.rle(ind)
  xnew <- x
  xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
  xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}

foo(data)
#[1]  1.0  5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1]   1.0   5.0  11.0  15.0  24.0  31.5  65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5

I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for loop in Rcpp.

answered Sep 17 '22 23:09

Roland

I have a data.table based solution, same could be translated into dplyr I guess:

library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]

unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])

   neigh_seq    V1
1:         1   1.0
2:         1   5.0
3:         1  11.0
4:         1  15.0
5:         1  24.0
6:         2  31.5
7:         3  65.0
8:         4 100.0
9:         5 140.0

What it does : first line set neigbours to 1 if the difference with following number is 1

 1:     1          0
 2:     5          0
 3:    11          0
 4:    15          0
 5:    24          0
 6:    31          0
 7:    32          1
 8:    65          0
 9:    99          0
10:   100          1
11:   101          1
12:   140          0

I wanr to group so that neighbour variable is 1 for all neigbours. I need to add 1 to each end of each groups:

df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
    data2 neighbours
 1:     1          0
 2:     5          0
 3:    11          0
 4:    15          0
 5:    24          0
 6:    31          1
 7:    32          1
 8:    65          0
 9:    99          1
10:   100          1
11:   101          1
12:   140          0

Then after I just do a grouping on changing neighbour value, and set the value to mean if they are neihbours

df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
    rleid    V1
 1:     1   1.0
 2:     1   5.0
 3:     1  11.0
 4:     1  15.0
 5:     1  24.0
 6:     2  31.5
 7:     2  31.5
 8:     3  65.0
 9:     4 100.0
10:     4 100.0
11:     4 100.0
12:     5 140.0

and take the unique values. And voila.

answered Sep 20 '22 23:09

denis

Related questions
                            
                                Drawing manually on a figure
                            
                                ggplot2: Transparent legend background when stat_smooth is used
                            
                                Use infoBox from shinydashboard into shiny
                            
                                Polynomial regression in R - with extra constraints on the curve
                            
                                How to scrape a table with rvest and xpath?
                            
                                Leaflet on R: How to create layers and colors for each factor level in dataframe
                            
                                OR operator in filter()?
                            
                                creating custom blocks in RStudio's bookdown
                            
                                Get rid of vertex labels in graph plot in R
                            
                                How do I get the data from the selected rows of a filtered datatable (DT)?
                            
                                R CMD Check: Unusual Checking installed package size note
                            
                                bind_rows of different data types
                            
                                Replacing all umlauts simultaneously in R (using regex)
                            
                                how do you create a subsection in R markdown
                            
                                ifelse with data.table
                            
                                7 Day Moving Average per group - R
                            
                                Displaying numbered list in R package
                            
                                How to get a second bibliography?
                            
                                How to use walk to silently plot ggplot2 output with purrr
                            
                                rgeos package installation error on linux [R]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With