Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

keep region names when tidying a map using broom package

I am using the getData function from the raster package to retrieve the map of Argentina. I would like to plot the resulting map using ggplot2, so I am converting to a dataframe using the tidy function from the broom package. This works fine, but I can't figure out how to preserve the names of the federal districts so that I can use them on the map.

Here is my original code that does not preserve the district names:

# Original code: ##################################
# get the map data from GADM.org and then simplify it
arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/")     %>% 
  # simplify
  rmapshaper::ms_simplify(keep = 0.01) %>% 
  # tidy to a dataframe
  broom::tidy()

# plot the map
library(ggplot2)
ggplot(data=arg_map_1) +
  geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
       color="#000000", size=0.25)

And here is the code with a hack to pull the district names out of the SPDF and use them as the map IDs:

# Code with a hack to keep the district names: ################################
# get the map data from GADM.org and then simplify it
arg_map_1 <- raster::getData(country = "ARG", level = 1, path = "./data/") %>% 
  # simplify
  rmapshaper::ms_simplify(keep = 0.01)  

for(region_looper in seq_along(arg_map_1@data$NAME_1)){
  arg_map_1@polygons[[region_looper]]@ID <- 
    as.character(arg_map_1@data$NAME_1[region_looper]) 
}

# tidy to a dataframe
arg_map_1 <- arg_map_1 %>% 
  broom::tidy()

library(ggplot2)
ggplot(data=arg_map_1) +
  geom_map(map=arg_map_1, aes(x=long, y=lat, map_id=id, fill=id),
           color="#000000", size=0.25)

I keep thinking that there must be some way to use the tidy function that preserves the names, but for the life of me, I can't figure it out.

like image 346
jkgrain Avatar asked Feb 06 '23 17:02

jkgrain


1 Answers

You can use the join function from package plyr. Here is a general solution (it looks long but it is actually very easy):

  1. Load shapefile: Let us say you have a shapefile my_shapefile.shp in your working directory. Let's load it:

    shape <- readOGR(dsn = "/my_working_directory", layer = "my_shapefile")
    

    Notice that inside this shapefile there is a dataframe, which can be accessed with shape@data. For example, this dataframe could look like this:

    > head(shape@data)
           code                   region     label
    0 E12000006          East of England E12000006
    1 E12000007                   London E12000007
    2 E12000002               North West E12000002
    3 E12000001               North East E12000001
    4 E12000004            East Midlands E12000004
    5 E12000003 Yorkshire and The Humber E12000003
    
  2. Create new dataframe from shapefile: Use the broom package to tide the shapefile dataframe:

    new_df <- tidy(shape)
    

This results in something like this:

> head(new_df)
      long      lat order  hole piece group id           
1 547491.0 193549.0     1 FALSE     1   0.1  0 
2 547472.1 193465.5     2 FALSE     1   0.1  0 
3 547458.6 193458.2     3 FALSE     1   0.1  0 
4 547455.6 193456.7     4 FALSE     1   0.1  0 
5 547451.2 193454.3     5 FALSE     1   0.1  0 
6 547447.5 193451.4     6 FALSE     1   0.1  0

Unfortunately, tidy() lost the variable names ("region", in this example). Instead, we got a new variable "id", starting at 0. Fortunately, the ordering of "id" is the same as that stored in shape@data$region. Let us use this to recover the names.

  1. Create auxiliary dataframe with row names: Let us create a new dataframe with the row names. Additionally, we will add an "id" variable, identical to the one tidy() created:

    # Recover row name 
    temp_df <- data.frame(shape@data$region)
    names(temp_df) <- c("region")
    # Create and append "id"
    temp_df$id <- seq(0,nrow(temp_df)-1)
    
  2. Merge row names with new dataframe using "id": Finally, let us put the names back into the new dataframe:

    new_df <- join(new_df, temp_df, by="id")
    

That's it! You can even add more variables to the new dataframe, by using the join command and the "id" index. The final result would be something like:

> head(new_df)
      long      lat order  hole piece group id            name    var1    var2 
1 547491.0 193549.0     1 FALSE     1   0.1  0 East of England   0.525   0.333   
2 547472.1 193465.5     2 FALSE     1   0.1  0 East of England   0.525   0.333   
3 547458.6 193458.2     3 FALSE     1   0.1  0 East of England   0.525   0.333   
4 547455.6 193456.7     4 FALSE     1   0.1  0 East of England   0.525   0.333   
5 547451.2 193454.3     5 FALSE     1   0.1  0 East of England   0.525   0.333   
6 547447.5 193451.4     6 FALSE     1   0.1  0 East of England   0.525   0.333   
like image 133
luchonacho Avatar answered Feb 08 '23 15:02

luchonacho