Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr distinct to ignore geometries of sf object in R

I have a dataset with multiple polygons in different locations that share the same attributes. I only want one polygon in my dateset for each set of unique attributes (so in my example below, that would be Area and Zone) - I don't care about where they are so I want to ignore the geometry attribute.

library(sf)
library(dplyr)

    Areas <- st_as_sf(tibble(
      Area =c("Zone1", "Zone1","Zone2","Zone1"),
      Zone =c("Area27","Area27","Area42","Area27"),
      lng = c(20.1, 20.2, 20.1, 20.1),
      lat = c(-1.1, -1.2, -1.1, -1.1)),
    coords = c("lng", "lat")) %>% st_buffer(.,100)

I am using dplyr distinct to remove duplicate records, but I am finding the geometry column is being used to determine distinct records, even though I believe this should be ignoring the geometry column:

Areas %>% distinct(across(-geometry),.keep_all=TRUE)

However, it is returns two results for Zone1 and Area27 when the geometry is different. Is this expected behaviour or am I do something wrong?

My required output would only have two rows in it, one for Zone1 & Area27 and another for Zone2 & Area42 with the geometry for those rows retained i.e. something similar to what happens you run the same code on a normal tibble:

Table <- tibble(
  Area =c("Zone1", "Zone1","Zone2","Zone1"),
  Zone =c("Area27","Area27","Area42","Area27"),
  lng = c(20.1, 20.2, 20.1, 20.1),
  lat = c(-1.1, -1.2, -1.1, -1.1))

Table %>% distinct(across(c(-lng,-lat)),.keep_all=TRUE)  
like image 402
Chris Avatar asked Sep 16 '25 18:09

Chris


1 Answers

I found an alternative method:

Areas %>% group_by(Area,Zone) %>% 
          mutate(id = row_number()) %>% 
          filter(id == 1) %>% 
          select(-id)

If you are dealing with a dataset with a lot of polygons this is likely to be faster than @Waldi's answer (at least it was for me).

like image 144
Chris Avatar answered Sep 19 '25 10:09

Chris