Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Improve centering county names ggplot & maps

Early I posted a question about plotting county names on a map using ggplot and maps found HERE. My first approach was to take the means of all the lat and long coordinates per county as seen here: enter image description here

Thankfully Andrie had 2 suggestions to improve the centering using a center of ranges and then the coord_map() {which appears to keep the aspect ratio correct}. This imporved the centering a great deal as seen here: enter image description here

I think this looks better but still has some difficulties with overlap problems. I am hoping to further improve the centering (In that same thread Justin suggested a kmeans approach). I am ok with rotating text if necessary but am hoping for names that are centered and rotated if necessary (they extend beyond the county borders) to best display the county names on the map.

Any ideas?

library(ggplot2); library(maps)

county_df <- map_data('county')  #mappings of counties by state
ny <- subset(county_df, region=="new york")   #subset just for NYS
ny$county <- ny$subregion
p <- ggplot(ny, aes(long, lat, group=group)) +  geom_polygon(colour='black', fill=NA)

#my first approach to centering
cnames <- aggregate(cbind(long, lat) ~ subregion, data=ny, FUN=mean)
ggplot(ny, aes(long, lat)) +  
    geom_polygon(aes(group=group), colour='black', fill=NA) +
    geom_text(data=cnames, aes(long, lat, label = subregion), size=3)

#Andrie's much improved approach to centering
cnames <- aggregate(cbind(long, lat) ~ subregion, data=ny, 
                    FUN=function(x)mean(range(x)))
ggplot(ny, aes(long, lat)) +  
    geom_polygon(aes(group=group), colour='black', fill=NA) +
    geom_text(data=cnames, aes(long, lat, label = subregion), size=3) +
    coord_map()
like image 929
Tyler Rinker Avatar asked Feb 25 '12 06:02

Tyler Rinker


2 Answers

As I worked this out last night over at Talk Stats (link), it's actually pretty easy (as a product of the hours I spent into the early morning!) if you use the R spatial package (sp). I tested some of their other functions to create a SpatialPolygons object that you can use coordinates on to return a polygon centroid. I only did it for one county, but the label point of a Polygon (S4) object matched the centroid. Assuming this is true, then label points of Polygon objects are centroids. I use this little process to create a data frame of centroids and use them to plot on a map.

library(ggplot2)  # For map_data. It's just a wrapper; should just use maps.
library(sp)
library(maps)
getLabelPoint <- # Returns a county-named list of label points
function(county) {Polygon(county[c('long', 'lat')])@labpt}

df <- map_data('county', 'new york')                 # NY region county data
centroids <- by(df, df$subregion, getLabelPoint)     # Returns list
centroids <- do.call("rbind.data.frame", centroids)  # Convert to Data Frame
names(centroids) <- c('long', 'lat')                 # Appropriate Header

map('county', 'new york')
text(centroids$long, centroids$lat, rownames(centroids), offset=0, cex=0.4)

This will not work well for every polygon. Very often the process of labeling and annotation in GIS requires that you adjust labels and annotation for those peculiar cases that do not fit the automatic (systematic) approach you want to use. The code-look-recode approach we would take to this is not apt. Better to include a check that a label of a given size for the given plot will fit within the polygon; if not, remove it from the record of text labels and manually insert it later to fit the situation--e.g., add a leader line and annotate to the side of the polygon or turn the label sideways as was displayed elsewhere.

like image 110
Bryan Goodrich Avatar answered Oct 25 '22 02:10

Bryan Goodrich


This was a very helpful discussion. For the benefit of those who grew up with dplyr, here is a minor tweak, using pipes in place of aggregate:

library(maps); library(dplyr); library(ggplot2)
ny <- map_data('county', 'new york') 

cnames1 <- aggregate(cbind(long, lat) ~ subregion, data=ny, 
                     FUN=function(x)mean(range(x)))
cnames2 <- ny %>% group_by(subregion) %>%
    summarize_at(vars(long, lat), ~ mean(range(.)))

all.equal(cnames1, as.data.frame(cnames2))
like image 31
Robert McDonald Avatar answered Oct 25 '22 01:10

Robert McDonald