Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add legend for regional map with a legend describing associated labels using ggplot2?

Tags:

r

map

ggplot2

SpatialPoly Data: SpatialData

Yield Data: Yield Data

Code:

    ## Loading packages
    library(rgdal)
    library(plyr)
    library(maps)
    library(maptools)
    library(mapdata)
    library(ggplot2)
    library(RColorBrewer)
    library(foreign)  
    library(sp)

    ## Loading shapefiles and .csv files
    #Morocco <- readOGR(dsn=".", layer="Morocco_adm0")
    MoroccoReg <- readOGR(dsn=".", layer="Morocco_adm1")
    MoroccoYield <- read.csv(file = "F:/Purdue University/RA_Position/PhD_ResearchandDissert/PhD_Draft/Country-CGE/RMaps_Morocco/Morocco_Yield.csv", header=TRUE, sep=",", na.string="NA", dec=".", strip.white=TRUE)

    ## Reorder the data in the shapefile based on the category variable "ID_1" and change to dataframe
    MoroccoReg <- MoroccoReg[order(MoroccoReg$ID_1), ]
    MoroccoReg.df <- fortify(MoroccoReg)

    ## Add the yield impacts column to shapefile from the .csv file by "ID_1"
    ## Note that in the .csv file, I just added the column "ID_1" to match it with the shapefile
    MoroccoReg.df <- cbind(MoroccoReg.df,MoroccoYield,by = 'ID_1')

    ## Check the structure and contents of shapefile
    attributes(MoroccoReg.df)

    ## Define new theme for map
    ## I have found this function on the website
    theme_map <- function (base_size = 12, base_family = "") {
    theme_gray(base_size = base_size, base_family = base_family) %+replace% 
    theme(
    axis.line=element_blank(),
    axis.text.x=element_blank(),
    axis.text.y=element_blank(),
    axis.ticks=element_blank(),
    axis.ticks.length=unit(0.3, "lines"),
    axis.ticks.margin=unit(0.5, "lines"),
    axis.title.x=element_blank(),
    axis.title.y=element_blank(),
    legend.background=element_rect(fill="white", colour=NA),
    legend.key=element_rect(colour="white"),
    legend.key.size=unit(1.5, "lines"),
    legend.position="right",
    legend.text=element_text(size=rel(1.2)),
    legend.title=element_text(size=rel(1.4), face="bold", hjust=0),
    panel.background=element_blank(),
    panel.border=element_blank(),
    panel.grid.major=element_blank(),
    panel.grid.minor=element_blank(),
    panel.margin=unit(0, "lines"),
    plot.background=element_blank(),
    plot.margin=unit(c(1, 1, 0.5, 0.5), "lines"),
    plot.title=element_text(size=rel(1.8), face="bold", hjust=0.5),
    strip.background=element_rect(fill="grey90", colour="grey50"),
    strip.text.x=element_text(size=rel(0.8)),
    strip.text.y=element_text(size=rel(0.8), angle=-90) 
    )   
    }

    ## Plotting 

    MoroccoRegMap1 <- ggplot(data = MoroccoReg.df, aes(long, lat, group = group)) 
    MoroccoRegMap1 <- MoroccoRegMap1 + geom_polygon(aes(fill = A2Med_noCO2))
    MoroccoRegMap1 <- MoroccoRegMap1 + geom_path(colour = 'gray', linestyle = 2)
    #MoroccoRegMap <- MoroccoRegMap + scale_fill_gradient(low = "#CC0000",high = "#006600")
    MoroccoRegMap1 <- MoroccoRegMap1 + scale_fill_gradient2(name = "%Change in yield",low = "#CC0000",mid = "#FFFFFF",high = "#006600")
    MoroccoRegMap1 <- MoroccoRegMap1 + labs(title="SRES_A2, noCO2 Effect")
    MoroccoRegMap1 <- MoroccoRegMap1 + coord_equal() + theme_map()
    MoroccoRegMap1

Results:

Map

Question:

In Yield data, I have a column that describes the labels corresponding to each of the entries in the column "ID_1". What I am trying to achieve is two things:

1) plot the map and add the "ID_1" variable entries as labels on the map, thus identifying each region;

2) generate a second legend, besides the one that captures the values in the data, and which the entries in "ID_1" and their corresponding description in the "Labels" column in the dataframe.

I hope I framed my question clearly.

thanks.

like image 270
iouraich Avatar asked Dec 04 '13 14:12

iouraich


1 Answers

First, let me apologize for taking so long to get back - I missed your comment among all the others. Is this what you had in mind?

This was produced with the following code. Before getting into an explanation, you should be aware that creating a legend is the least of your problems. Note how the colors are different in the two maps. Your code above does not assign CO2 changes to the correct regions. For example, according to MoroccoYields.csv, the largest change (improvement?) was -0.205 in Region 4, but on your map the largest (darkest red) is at the northeastern tip of Morocco, which is actually l'Oriental (Region 6). An explanation follows the code.

## Loading packages
library(rgdal)
library(plyr)
library(maps)
library(maptools)
library(mapdata)
library(ggplot2)
library(RColorBrewer)
library(foreign)  
library(sp)

# get.centroids: function to extract polygon ID and centroid from shapefile
get.centroids = function(x){
  poly = MoroccoReg@polygons[[x]]
  ID   = poly@ID
  centroid = as.numeric(poly@labpt)
  return(c(id=ID, long=centroid[1], lat=centroid[2]))
}
#setwd("Directory where shapefile and Yields are stored")
## Loading shapefiles and .csv files
MoroccoReg        <- readOGR(dsn=".", layer="Morocco_adm1")
MoroccoYield      <- read.csv(file = "Morocco_Yield.csv", header=TRUE, sep=",", na.string="NA", dec=".", strip.white=TRUE)
MoroccoYield$ID_1 <- substr(MoroccoYield$ID_1,3,10)

## Reorder the data in the shapefile based on the category variable "ID_1" and change to dataframe
MoroccoReg    <- MoroccoReg[order(MoroccoReg$ID_1), ]
MoroccoYield  <- cbind(id=rownames(MoroccoReg@data),MoroccoYield)
#  build table of labels for annotation (legend).
labs          <- do.call(rbind,lapply(1:14,get.centroids))
labs          <- merge(labs,MoroccoYield[,c("id","ID_1","Label")],by="id")
labs[,2:3]    <- sapply(labs[,2:3],function(x){as.numeric(as.character(x))})
labs$sort <- as.numeric(as.character(labs$ID_1))
labs          <- labs[order(labs$sort),]

MoroccoReg.df <- fortify(MoroccoReg)
## This does NOT work...
## Add the yield impacts column to shapefile from the .csv file by "ID_1"
## Note that in the .csv file, I just added the column "ID_1" to match it with the shapefile
#MoroccoReg.df <- cbind(MoroccoReg.df,MoroccoYield,by = 'ID_1')
## Do it this way...
MoroccoReg.df <- merge(MoroccoReg.df,MoroccoYield, by="id")

## Check the structure and contents of shapefile
attributes(MoroccoReg.df)
## Plotting 

MoroccoRegMap1 <- ggplot(data = MoroccoReg.df, aes(long, lat, group=id)) 
MoroccoRegMap1 <- MoroccoRegMap1 + geom_polygon(aes(fill = A2Med_noCO2))
MoroccoRegMap1 <- MoroccoRegMap1 + geom_path(colour = 'gray', linestyle = 2)
MoroccoRegMap1 <- MoroccoRegMap1 + scale_fill_gradient2(name = "%Change in yield",low = "#CC0000",mid = "#FFFFFF",high = "#006600")
MoroccoRegMap1 <- MoroccoRegMap1 + labs(title="SRES_A2, noCO2 Effect")
MoroccoRegMap1 <- MoroccoRegMap1 + coord_equal() #+ theme_map()
MoroccoRegMap1 <- MoroccoRegMap1 + geom_text(data=labs, aes(x=long, y=lat, label=ID_1), size=4)
MoroccoRegMap1 <- MoroccoRegMap1 + annotate("text", x=max(labs$long)-5, y=min(labs$lat)+3-0.5*(1:14),
                                            label=paste(labs$ID_1,": ",labs$Label,sep=""),
                                            size=3, hjust=0)
MoroccoRegMap1

Explanation:

First, on merging your yield data with the map regions: you use

MoroccoReg.df <- cbind(MoroccoReg.df,MoroccoYield,by = 'ID_1')

This is not how cbind(...) works. cbind(...) merely combines it's arguments column-wise. It is not a merge function. So you had a data frame, MoroccoReg.df, with 107,800 rows (one row for every line endpoint on your map), and you are combining it with MoroccoYield, which has 14 rows (1 for every Region). So cbind(...) replicates those 14 rows 7700 times to fill out the 107,800 rows it needs. The expression by="ID_1" merely adds another column named "by" with "ID_1" replicated 107,800 times. Run the statement above and type head(MoroccoReg.df) and look for the last column.

So how to do the merge? There are a number of functions in R that are supposed to make this easy, but I couldn't get any of them to work. This is what did work:

Every polygon in the shapefile has an ID. There is also an "ID_1" field in the shapefile data section, but these are different and unrelated. [BTW: The ID_1 field in the shapefile data section, and the ID_1 field in your csv file were also different: the latter has "TR" prepended to the region number; so that had to be dealt with as well]. Reordering the shapefile with:

MoroccoReg    <- MoroccoReg[order(MoroccoReg$ID_1), ]

will change the order of the polygons, but will not change their ID's. It turns out that the polygon ID matches the row name in the data section of the shapefile, so I prepended that (using cbind(...)!) to your MoroccoYeild data frame.

MoroccoYield  <- cbind(id=rownames(MoroccoReg@data),MoroccoYield)

So now MoroccoYield has an id field which maps to the polygon ID, and an ID_1 field, which identifies the Region. Now we can fortify(...) and merge(...). merge(...) does take a by= argument.

MoroccoReg.df <- fortify(MoroccoReg)
MoroccoReg.df <- merge(MoroccoReg.df,MoroccoYield, by="id")

This appends all of your MoroccoYield columns to the appropriate rows of MoroccoReg.df.

Creating the legend:

The obvious question is how to position the labels? Ideally, we would place the Region number from MoroccoYield$ID_1 at the centroid of each region, and then create a legend that identifies the Regions, based on MoroccoYield$Label.

So where to find the centroids? These are stored in an obscure location in the polygon section of the shapefile. To make a long story short, I created a utility function get.centroid(...) which extracts the centroid from a polygon. Then I applied that function to all the polygons to produce a table of centroids with corresponding polygon ID. Then I merged that with the labels in MoroccoYield. This created a data frame labs which has the following columns:

id:    polygon ID
long:  centroid longitude
lat:   centroid latitude
ID_1:  region ID
label: region name
sort:  a sortable (numeric) version of ID_1

Then, adding the following code to your ggplot...

...
MoroccoRegMap1 <- MoroccoRegMap1 + geom_text(data=labs, aes(x=long, y=lat, label=label.id), size=4)
MoroccoRegMap1 <- MoroccoRegMap1 + annotate("text", x=max(labs$long)-5, y=min(labs$lat)+3-0.5*(1:14),
                                            label=paste(labs$label.id,": ",labs$Label,sep=""),
                                            size=3, hjust=0)

...creates the map. There's no clean way, that I could find, to do this with a formal "ggplot legend", so I had to use annotate(...). Positioning the annotation is kind of a hack, but it seems to work.

Edit: In response to @smailov83's comment, if you change the code to create the ggplot to this...

MoroccoRegMap1 <- ggplot(data = MoroccoReg.df, aes(long, lat, group=group)) 
MoroccoRegMap1 <- MoroccoRegMap1 + geom_polygon(aes(fill = A2Med_noCO2))
MoroccoRegMap1 <- MoroccoRegMap1 + geom_path(colour = 'gray', linestyle = 2)
MoroccoRegMap1 <- MoroccoRegMap1 + scale_fill_gradient2(name = "%Change in yield",low = "#CC0000",mid = "#FFFFFF",high = "#006600")
MoroccoRegMap1 <- MoroccoRegMap1 + labs(title="SRES_A2, noCO2 Effect")
MoroccoRegMap1 <- MoroccoRegMap1 + coord_equal() #+ theme_map()
MoroccoRegMap1 <- MoroccoRegMap1 + geom_text(data=labs, aes(x=long, y=lat, label=ID_1, group=ID_1), size=4)
MoroccoRegMap1 <- MoroccoRegMap1 + annotate("text", x=max(labs$long)-5, y=min(labs$lat)+3-0.5*(1:14),
                                            label=paste(labs$ID_1,": ",labs$Label,sep=""),
                                            size=3, hjust=0)

...you get this:

Which I believe fixes the problem. The reason for the extra lines in your map was that the ggplot must be grouped by the column MoroccoReg.df$group (so, aes(..., group=group) not aes(...,group=id)). When you do this, however, ggplot tries to group by "group" in all layers. In geom_text(...), where we are using a new, local dataset - the labs data frame - there is no group column. To deal with this, we must explicitly set group to something else in geom_text(...). Bottom line: this seem to work.

like image 120
jlhoward Avatar answered Nov 15 '22 07:11

jlhoward