Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting a shape file with ggplot2 error

Tags:

r

ggplot2

sas

Over the last few days I pretty much dove head first into using R for mapping. I have used R extensively for modelling etc. but not this kind of work before. I have some questions and issues regarding shapefiles, how they're read and so on.

I have downloaded the shape files from the Australian Bureau of Statistics there are numerous files with state borders, post codes, cities and so on. The shape files are massive, the Australian state borders has about 1.8 million coordinate points in it, the other file I tried was the statistical area which has over 8 million in it. I didn't do anything with this file as it is just too big for my R set up.

I read the shape file in with readShapePoly and converted it like so

AUS@data$id = rownames(AUS@data)    
AUS.points = fortify(AUS, region="id")
AUS.df = join(AUS.points, AUS@data, by="id")

Once I had converted the State borders shape file from SpatialPolygonsDataFrame to a regular dataframe I plotted it successfully but it took forever and the detail was too great. I thought to use thinnedSpatialPoly to simplify it but it gives the error:

Error in stopifnot(length(Sr@polygons) == nrow(data)) :trying to get slot "polygons" from an object of a basic class ("NULL") with no slots

Which google cannot help me with.

My next strategy was to read it into SAS and use proc greduce which takes the file and creates a density field and you can choose how dense the polygons are.

proc mapimport out=states datafile='\Digital Boundaries\States\Shape file\STE_2011_AUST.shp';
id ste_code11;    run;
proc greduce data = states out = reduced_states;
id ste_code11;    run;

SAS has crap graphics and couldn't even plot the thing for me so I exported the dataset and read it back into R with the new density field which I hoping to subset the dataframe by and use in my plots.

My problem now is that when I go to plot in R i get thisMap of Australian State Borders

ggplot(data=states.df, aes(X, Y, group=SEGMENT)) + 
geom_polygon(colour='black', fill='white') + theme_bw()

I guess it is because the polygons are not in order or have broken? I used this function to try and rejoin my polygons but still no luck

RegroupElements <- function(df, longcol, idcol){
g <- rep(1, length(df[,longcol]))
if (diff(range(df[,longcol])) > 300) { # check if longitude within group differs more                than 300 deg, ie if element was split
d <- df[,longcol] > mean(range(df[,longcol])) # we use the mean to help us separate the   extreme values
g[!d] <- 1 # some marker for parts that stay in place (we cheat here a little, as we do not take into account concave polygons)
g[d] <- 2 # parts that are moved 
}
g <- paste(df[, idcol], g, sep=".") # attach to id to create unique group variable for   the dataset
df$group.regroup <- g
df
}

### Function to close regrouped polygons
# Takes dataframe, checks if 1st and last longitude value are the same, if not, inserts  first as last and reassigns order variable
ClosePolygons <- function(df, longcol, ordercol){
if (df[1,longcol] != df[nrow(df),longcol]) {
tmp <- df[1,]
df <- rbind(df,tmp)
}
o <- c(1: nrow(df)) # rassign the order variable
df[,ordercol] <- o
df
}  

So, finally my questions! How do people deal with large overly detailed shape files? Why wasn't thinnedspatialpoly working (I'd like to avoid SAS if possible)? How can I get my plot to not look like crap?

Finally my R specs:

R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252   
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
[5] LC_TIME=English_Australia.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] gridExtra_0.9   gpclib_1.5-1    ggmap_2.1       maptools_0.8-16
[5] lattice_0.20-6  rgeos_0.2-7     plyr_1.7.1      stringr_0.6    
[9] ggplot2_0.9.1   sp_0.9-99       shapefiles_0.6  foreign_0.8-50 
[13] fastshp_0.1-0  

loaded via a namespace (and not attached):
[1] colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2       labeling_0.1      
[5] MASS_7.3-18        memoise_0.1        munsell_0.3        png_0.1-4         
[9] proto_0.3-9.2      RColorBrewer_1.0-5 reshape2_1.2.1     RgoogleMaps_1.2.0 
[13] rjson_0.2.8        scales_0.2.1       tools_2.15.1   
like image 738
dom_oh Avatar asked Jul 11 '12 08:07

dom_oh


1 Answers

First, if you are diving in head first, don't go into the shallow end.

My fairly old PC can read the state digital boundaries in shapefile format no problem:

aus=readOGR(".","STE_2011_AUST")
plot(aus)

but the map is clearly over detailed. I also loaded it into Quantum GIS so I could have a good old zoom and pan around, and every tiny island is on there. I think its one of the most detailed country level maps I've ever seen. So secondly, you might want to try and find a readily-simplified map of the states (see www.gadm.org for possibles).

So lets see if gSimplify from package:rgeos helps:

aus2 = gSimplify(aus,0.1)
plot(aus2)

that removes a lot of the tiny islands but sadly for me (and a large chunk of the population) it removes New South Wales as well. Not good. If I lower the tolerance eventually I can get something that keeps NSW:

aus2 = gSimplify(aus,0.01)
plot(aus2)

but clearly there's some issue with gSimplify or the shapefile data itself. Anyway, if I save aus2 back to a shapefile there's a massive reduction in size, the .shp being 180k instead of 29 megabytes.

Also, I'd stick to plotting with base graphics.

like image 168
Spacedman Avatar answered Nov 02 '22 16:11

Spacedman