Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove line from polygon crossing the international dateline in R (e.g. Russia in rnaturalearth)

Tags:

r

mapping

sf

Question: Polygons that cross the international dateline frequently have a North-South line through them. Eastern Russia in the rnaturalearth package is a good example of this, but I have also encountered it with other spatial data. I would like to be able to remove this line for plotting.

Attempts: I primarily use the sf package in R for mapping. I have tried various solutions involving st_union, st_combine, st_wrap_dateline, st_remove_holes, as well as using functions from other packages such as aggregate, merge, and gUnaryUnion, but my efforts have been fruitless so far.

Example: The following code demonstrates the problem lines in Russia along the international dateline using the popular rnaturalearth package.

library(tidyverse)
library(rnaturalearth)
library(sf)

#Import data
world <- ne_countries(scale = "medium",
                       returnclass = "sf") 

#I use the Alaska albers projection for this map,
#limit extent (https://spatialreference.org/ref/epsg/nad83-alaska-albers/)
xmin <- -2255938
xmax <- 1646517
ymin <- 449981
ymax <- 2676986

#plot
ggplot()+
  geom_sf(data=world, color="black", size=1)+
  coord_sf(crs=3338)+
  xlim(c(xmin,xmax))+ylim(c(ymin,ymax))+
  theme_bw()

Thanks!

like image 687
Matt Callahan Avatar asked Oct 15 '22 20:10

Matt Callahan


1 Answers

Short answer

EPSG:3338 is the problem - use a UTM (326XX or 327XX) code instead.

Long answer

My gut feeling is this is related to the challenges of projecting geographic (long-lat) data to a flat surface - either a projected CRS, or more simply the flat surface of the plot viewer pane in RStudio.

We know that on a ellipsoidal model of Earth, the (minimum) on-ground distance between longitudes of -179 and +179 is the same as the distance between -1 and +1, a distance of 2 degrees. However from a numerical perspective, these two lines of longitude have a distance of 358 degrees between them.

Imagine you are an alien (or a flat-earther) and looking at the following projection of world, and you didn't know that Earth was ellipsoidal in shape (or you didn't know this was a projection). You would be forgiven for thinking that to get from one part of Russia (red) to the other, you would have to get wet. I guess by default, ggplot is a flat-earther.

enter image description here

Imagine each polygon in the above plot is a piece of a jigsaw. In your plot, I guess you are setting the origin to the centre of EPSG:3338 (coord_sf(crs = 3338)), which I think is somewhere in Alaska/Canada? (I'm guessing here as I don't use this notation, rather I prefer to transform data before sending to ggplot). Regardless, ggplot knows it should rearrange it's 'puzzle pieces', so longitude -179 and +179 are next to each other - but this is purely visual, as in your plot:

enter image description here

So, my guess is that when you try and use st_union() or st_simplify(), the polygons aren't actually next to each other in space so are not joined. This is where a projected CRS should solve the problem, transforming the coords to values relative to an origin other than (long 0, lat 0).

This I think is one source of trouble for you - a quick google of EPSG:3338 says it is good for Alaska, but no mention of Russia. The first thing that came up when I googled 'utm russia' was EPSG:32635. So, let's take a look at the values for longitude for EPSG codes 4326 (WGS84 longlat), 3338 (NAD83 Alaska) and 32635.

# pull out russia
world %>% 
  filter(
    str_detect(name_long, 'Russia')
  ) %>% 
  select(name_long, geometry) %>% 
  {. ->> russia}

# extract coords of each projection
russia %>% 
  st_transform(3338) %>% 
  {. ->> russia_3338} %>% 
  st_coordinates %>% 
  as_tibble %>% 
  select(X) %>% 
  mutate(
    crs = 'utm_3338'
  ) %>% 
  {. ->> russia_coords_3338}

russia %>% 
  st_transform(4326) %>% 
  {. ->> russia_4326} %>% 
  st_coordinates %>% 
  as_tibble %>% 
  select(X) %>% 
  mutate(
    crs = 'utm_4326'
  ) %>% 
  {. ->> russia_coords_4326}

russia %>% 
  st_transform(32635) %>% 
  {. ->> russia_32635} %>% 
  st_coordinates %>% 
  as_tibble %>% 
  select(X) %>% 
  mutate(
    crs = 'utm_32635'
  ) %>% 
  {. ->> russia_coords_32635}

Let's combine them and look at a histogram of longitude values

# inspect X coords on a histogram
bind_rows(
  russia_coords_3338,
  russia_coords_4326,
  russia_coords_32635,
) %>% 
  ggplot(aes(X))+
  geom_histogram()+
  facet_wrap(~crs, ncol = 1, scales = 'free')

enter image description here

So, as you can see projections 4326 and 3338 have 2 distinct groups of coords at either ends of the earth, with a big break (spanning x = 0) in between. Projection 32635 though, has only one group of coords, suggesting that the 2 parts of Russia, according to this projection, are numerically positioned next to each other. Projection 32635 works because it transforms the coords into '(minimum?) distance from an origin'; the origin of which (unlike long-lat coords) is not on the opposite side of the world and doesn't need to go 2 different directions around the globe to determine minimum distance to either end of the country (this is what causes the break in longitude coords for the other 2 projections). I don't know enough about EPSG:3338 to explain why it does this too, but suspect it's because it is Alaska-focused so they didn't consider crossing the 180th meridian.

If we plot russia_32635 we can see these pieces are next to each other, but remember we don't trust ggplot just yet. When we use st_simplify() this date line (red) disappears, proving that the 2 polygons are next to each other and can be simplified/unioned.

ggplot()+
  geom_sf(data = russia_32635, colour = 'red')+
  geom_sf(data = russia_32635 %>% st_simplify, fill = NA)

enter image description here

st_simplify() has dissolved the 2 boundaries on the date line, reducing our number of individual polygons from 100 to 98.

russia_32635 %>% 
  st_cast('POLYGON')

# Simple feature collection with 100 features and 1 field
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: 21006.08 ymin: 4772449 xmax: 6273473 ymax: 13233690
# Projected CRS: WGS 84 / UTM zone 35N


russia_32635 %>% 
  st_simplify %>% 
  st_cast('POLYGON')

# Simple feature collection with 98 features and 1 field
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: 21006.08 ymin: 4772449 xmax: 6273473 ymax: 13233690
# Projected CRS: WGS 84 / UTM zone 35N

Alternatively, it looks like st_union(..., by_feature = TRUE) also works - see ?st_union:

If by_feature is TRUE each feature geometry is unioned. This can for instance be used to resolve internal boundaries after polygons were combined using st_combine.

russia_32635 %>% 
  st_union(by_feature = TRUE) %>% 
  st_cast('POLYGON')

# Simple feature collection with 98 features and 1 field
# Geometry type: POLYGON
# Dimension:     XY
# Bounding box:  xmin: 21006.08 ymin: 4772449 xmax: 6273473 ymax: 13233690
# Projected CRS: WGS 84 / UTM zone 35N

So, technically there is your plot of Russia without the date line. I think Russia is tricky to plot because a) it's close to the poles, and b) it covers such a vast area meaning most projections are going to skew from one end of the country to another.

However to me, it makes sense to orient the plot 'north-up'. A way to do this is to make your own 'Mollweide' projection and assign the origin to the approximate centre of Russia (lon 99, lat 65). Without st_buffer(0), this plots with the date line for some reason (see here and here for examples, and section 6.5 here for explanation).

my_proj <- '+proj=moll +lon_0=99 +lat_0=65 +units=m'

russia_32635 %>% 
  st_buffer(0) %>% 
  st_transform(crs(my_proj)) %>%
  st_simplify %>% 
  ggplot()+
  geom_sf()

enter image description here

Bonus

I tried plotting russia_32635 %>% st_simplify with tmap and leaflet, but did not get desired results. I assume this is because these packages prefer geographic (lon-lat) coords; leaflet only accepts longlat format as far as I can tell, and although tmap can certainly handle projected data, my guess is that under the bonnet it transforms it (or similar) to it's preferred projection. Workarounds look to be available at the same links as above if you really really want this visualisaiton (here, here and here).

library(tmap)

russia_32635 %>% 
  st_simplify %>% 
  tm_shape()+
  tm_polygons()


library(leaflet)

russia_32635 %>% 
  st_simplify %>%
  st_transform(4326) %>% # because leaflet only works with longlat projections
  leaflet %>% 
  addTiles %>% 
  addPolygons()

Ultimately, you can only preserve 2/3 primary characteristics when projecting data: area, direction or distance. This is made even more obvious when projecting something as big and polar as Russia. Hopefully one of these options is suitable for your problem.

like image 87
hugh-allan Avatar answered Oct 20 '22 16:10

hugh-allan