Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Average neighbours inside a vector

My data :

data <- c(1,5,11,15,24,31,32,65)

There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be :

data <- c(1,5,11,15,24,31.5,65)

It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance :

data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140)
like image 713
Loulou Avatar asked Dec 10 '18 11:12

Loulou


People also ask

How does the average Nearest Neighbor tool work?

The Average Nearest Neighbor tool measures the distance between each feature centroid and its nearest neighbor's centroid location. It then averages all these nearest neighbor distances. If the average distance is less than the average for a hypothetical random distribution, the distribution of the features being analyzed is considered clustered.

How do you find the neighbourhood of a graph?

For example, in the image to the right, the neighbourhood of vertex 5 consists of vertices 1, 2 and 4 and the edge connecting vertices 1 and 2. The neighbourhood is often denoted NG ( v) or (when the graph is unambiguous) N ( v ).

How many edges does a neighbor graph have?

Neighbourhood (graph theory) Jump to navigation Jump to search. A graph consisting of 6 vertices and 7 edges. In graph theory, an adjacent vertex of a vertex v in a graph is a vertex that is connected to v by an edge.

What is the neighbourhood of a vertex in a graph?

The neighbourhood of a vertex v in a graph G is the subgraph of G induced by all vertices adjacent to v, i.e., the graph composed of the vertices adjacent to v and all edges connecting vertices adjacent to v.


3 Answers

Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1)), where 1 shows the gap threshold, i.e.

#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))

#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)] 

#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))

#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1]  1.0  5.0 11.0 15.0 24.0 65.0 31.5

You can also wrap it in a function. I left the gap as a parameter so you can adjust,

get_vec <- function(x, gap) {
    grp <- cumsum(c(TRUE, diff(x) > gap))
    i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
    i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
    return(c(i1, i2[!is.na(i2)]))
}

get_vec(a, 1)
#[1]  1.0  5.0 11.0 15.0 24.0 65.0 31.5

get_vec(a_2, 1)
#[1]   1.0   5.0  11.0  15.0  24.0  65.0 140.0  31.5 100.0

DATA:

a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)
like image 199
Sotos Avatar answered Sep 20 '22 23:09

Sotos


Here is my solution, which uses run-length encoding to identify groups:

foo <- function(x) {
  y <- x - seq_along(x) #normalize to zero differences in groups
  ind <- rle(y) #run-length encoding
  ind$values <- ind$lengths != 1 #to find groups
  ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
  ind <- inverse.rle(ind)
  xnew <- x
  xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
  xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}

foo(data)
#[1]  1.0  5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1]   1.0   5.0  11.0  15.0  24.0  31.5  65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5

I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for loop in Rcpp.

like image 22
Roland Avatar answered Sep 17 '22 23:09

Roland


I have a data.table based solution, same could be translated into dplyr I guess:

library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]

unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])

   neigh_seq    V1
1:         1   1.0
2:         1   5.0
3:         1  11.0
4:         1  15.0
5:         1  24.0
6:         2  31.5
7:         3  65.0
8:         4 100.0
9:         5 140.0

What it does : first line set neigbours to 1 if the difference with following number is 1

 1:     1          0
 2:     5          0
 3:    11          0
 4:    15          0
 5:    24          0
 6:    31          0
 7:    32          1
 8:    65          0
 9:    99          0
10:   100          1
11:   101          1
12:   140          0

I wanr to group so that neighbour variable is 1 for all neigbours. I need to add 1 to each end of each groups:

df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
    data2 neighbours
 1:     1          0
 2:     5          0
 3:    11          0
 4:    15          0
 5:    24          0
 6:    31          1
 7:    32          1
 8:    65          0
 9:    99          1
10:   100          1
11:   101          1
12:   140          0

Then after I just do a grouping on changing neighbour value, and set the value to mean if they are neihbours

df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
    rleid    V1
 1:     1   1.0
 2:     1   5.0
 3:     1  11.0
 4:     1  15.0
 5:     1  24.0
 6:     2  31.5
 7:     2  31.5
 8:     3  65.0
 9:     4 100.0
10:     4 100.0
11:     4 100.0
12:     5 140.0

and take the unique values. And voila.

like image 27
denis Avatar answered Sep 20 '22 23:09

denis