My data :
data <- c(1,5,11,15,24,31,32,65)
There are 2 neighbours: 31 and 32. I wish to remove them and keep only the mean value (e.g. 31.5), in such a way data would be :
data <- c(1,5,11,15,24,31.5,65)
It seems simple, but I wish to do it automatically, and sometimes with vectors containing more neighbours. For instance :
data_2 <- c(1,5,11,15,24,31,32,65,99,100,101,140)
The Average Nearest Neighbor tool measures the distance between each feature centroid and its nearest neighbor's centroid location. It then averages all these nearest neighbor distances. If the average distance is less than the average for a hypothetical random distribution, the distribution of the features being analyzed is considered clustered.
For example, in the image to the right, the neighbourhood of vertex 5 consists of vertices 1, 2 and 4 and the edge connecting vertices 1 and 2. The neighbourhood is often denoted NG ( v) or (when the graph is unambiguous) N ( v ).
Neighbourhood (graph theory) Jump to navigation Jump to search. A graph consisting of 6 vertices and 7 edges. In graph theory, an adjacent vertex of a vertex v in a graph is a vertex that is connected to v by an edge.
The neighbourhood of a vertex v in a graph G is the subgraph of G induced by all vertices adjacent to v, i.e., the graph composed of the vertices adjacent to v and all edges connecting vertices adjacent to v.
Here is another idea that creates an id via cumsum(c(TRUE, diff(a) > 1))
, where 1
shows the gap threshold, i.e.
#our group variable
grp <- cumsum(c(TRUE, diff(a) > 1))
#keep only groups with length 1 (i.e. with no neighbor)
i1 <- a[!!!ave(a, grp, FUN = function(i) length(i) > 1)]
#Find the mean of the groups with more than 1 rows,
i2 <- unname(tapply(a, grp, function(i)mean(i[length(i) > 1])))
#Concatenate the above 2 (eliminating NAs from i2) to get final result
c(i1, i2[!is.na(i2)])
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
You can also wrap it in a function. I left the gap as a parameter so you can adjust,
get_vec <- function(x, gap) {
grp <- cumsum(c(TRUE, diff(x) > gap))
i1 <- x[!!!ave(x, grp, FUN = function(i) length(i) > 1)]
i2 <- unname(tapply(x, grp, function(i) mean(i[length(i) > 1])))
return(c(i1, i2[!is.na(i2)]))
}
get_vec(a, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 31.5
get_vec(a_2, 1)
#[1] 1.0 5.0 11.0 15.0 24.0 65.0 140.0 31.5 100.0
DATA:
a <- c(1,5,11,15,24,31,32,65)
a_2 <- c(1, 5, 11, 15, 24, 31, 32, 65, 99, 100, 101, 140)
Here is my solution, which uses run-length encoding to identify groups:
foo <- function(x) {
y <- x - seq_along(x) #normalize to zero differences in groups
ind <- rle(y) #run-length encoding
ind$values <- ind$lengths != 1 #to find groups
ind$values[ind$values] <- cumsum(ind$values[ind$values]) #group ids
ind <- inverse.rle(ind)
xnew <- x
xnew[ind != 0] <- ave(x, ind, FUN = mean)[ind != 0] #calculate means
xnew[!(duplicated(ind) & ind != 0)] #remove duplicates from groups
}
foo(data)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0
foo(data_2)
#[1] 1.0 5.0 11.0 15.0 24.0 31.5 65.0 100.0 140.0
data_3 <- c(1, 2, 4, 1, 2)
foo(data_3)
#[1] 1.5 4.0 1.5
I assume that you don't need an extremely efficient solution. If you do, I'd recommend a simple C++ for
loop in Rcpp.
I have a data.table based solution, same could be translated into dplyr I guess:
library(data.table)
df <- data.table(data2 = c(1,5,11,15,24,31,32,65,99,100,101,140))
df[,neighbours := ifelse(c(0,diff(data_2)) == 1,1,0)]
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
df[,neigh_seq := rleid(neighbours)]
unique(df[,ifelse(neighbours == 1,mean(data2),data2),by = neigh_seq])
neigh_seq V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 3 65.0
8: 4 100.0
9: 5 140.0
What it does : first line set neigbours to 1 if the difference with following number is 1
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 0
7: 32 1
8: 65 0
9: 99 0
10: 100 1
11: 101 1
12: 140 0
I wanr to group so that neighbour
variable is 1 for all neigbours. I need to add 1 to each end of each groups:
df[,neighbours := c(neighbours[1:(.N-1)],1),by = rleid(neighbours)]
data2 neighbours
1: 1 0
2: 5 0
3: 11 0
4: 15 0
5: 24 0
6: 31 1
7: 32 1
8: 65 0
9: 99 1
10: 100 1
11: 101 1
12: 140 0
Then after I just do a grouping on changing neighbour
value, and set the value to mean if they are neihbours
df[,ifelse(neighbours == 1,mean(data2),data2),by = rleid(neighbours)]
rleid V1
1: 1 1.0
2: 1 5.0
3: 1 11.0
4: 1 15.0
5: 1 24.0
6: 2 31.5
7: 2 31.5
8: 3 65.0
9: 4 100.0
10: 4 100.0
11: 4 100.0
12: 5 140.0
and take the unique values. And voila.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With