I have a georeferenced event data set, in a data frame of the form:
LONGITUDE LATITUDE VAR1
33.4 4.4 5
33.4 4.4 3
33.4 4.4 1
30.4 4.2 2
28.4 5.1 2
It counts fatalities in events which are georeferenced. Apart from it, I have a shape file of the provinces in a country, like this:
> str(shapefile)
'data.frame': 216 obs. of 5 variables:
$ CONSTI_COD: num 1 2 3 4 5 6 7 8 9 10 ...
$ Area : num 20 11.7 10.7 223.3 38.7 ...
$ PROVINCE_NAME : Factor w/ 216 levels "CENTRAL","COAST",..: 4 4 4 4 4 4 4 4 2 2 ...
$ Shape_Leng: num 0.193 0.152 0.201 0.872 0.441 ...
$ Shape_Area: num 0.001628 0.000947 0.000867 0.018135 0.003145 ...
..@ polygons :List of 216
.. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
.. .. .. ..@ Polygons :List of 1
.. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
.. .. .. .. .. .. ..@ labpt : num [1:2] 36.9 -1.3
.. .. .. .. .. .. ..@ area : num 0.00163
.. .. .. .. .. .. ..@ hole : logi FALSE
.. .. .. .. .. .. ..@ ringDir: int 1
.. .. .. .. .. .. ..@ coords : num [1:151, 1:2] 36.8 36.8 36.8 36.9 36.9 ...
.. .. .. ..@ plotOrder: int 1
.. .. .. ..@ labpt : num [1:2] 36.9 -1.3
.. .. .. ..@ ID : chr "0"
.. .. .. ..@ area : num 0.00163
[...etc]
What I need to do is to place the event data within the provinces, i.e. add a fourth column to the first data frame which states in which province happened each event, based on the coordinates. So I would have something like this:
LONGITUDE LATITUDE VAR1 PROVINCE
33.4 4.4 5 CENTRAL
33.4 4.4 3 CENTRAL
33.4 4.4 1 CENTRAL
30.4 4.2 2 COAST
28.4 5.1 2 COAST
Is this posible? I think I found some time ago a post that explained how to do this (outside Stack Overflow), but I cannot find it now.
Thanks!
(Sorry if there is a similar question over here. I made a search, but I didn't find an answer, maybe because I don't really know what I'm looking for. I would really appreciate a link to a similar post.)
What you're talking about is a "spatial join" (or "spatial intersection" or "overlay"). This is pretty straightforward with the help of the over
function from the sp
package.
Here's an example.
First, let's download and import a polygon shapefile of the world's countries.
download.file(paste0('http://www.naturalearthdata.com/http//',
'www.naturalearthdata.com/download/110m/cultural/',
'ne_110m_admin_0_countries.zip'),
f <- tempfile())
unzip(f, exdir=tempdir())
library(rgdal)
countries <- readOGR(tempdir(), 'ne_110m_admin_0_countries')
Now we'll create some random coordinate data that fall within the extent of the polygon shapefile. We then define the columns x
and y
as coordinates
, and assign the same CRS as that of the polygons (although this may not be the case for your data; be sure to assign correct coordinate systems).
pts <- data.frame(x=runif(10, -180, 180), y=runif(10, -90, 90),
VAR1=LETTERS[1:10])
coordinates(pts) <- ~x+y # pts needs to be a data.frame for this to work
proj4string(pts) <- proj4string(countries)
plot(countries)
points(pts, pch=20, col='red')
Now we can perform the spatial overlay:
over(pts, countries)$admin
# [1] <NA> <NA> Turkey <NA>
# [5] Macedonia <NA> China Argentina
# [9] <NA> Canada
# 177 Levels: Afghanistan Albania ... Zimbabwe
Note that in this case, some of the random points fell in the ocean (i.e. outside polygons). When intersected with the polygon object, these points return NA.
Now we cbind
the desired attribute to pts
:
cbind.data.frame(pts, country=over(pts, countries)$admin)
# x y VAR1 country
# 1 -52.59404 -37.422879 A <NA>
# 2 -33.88867 -40.194482 B <NA>
# 3 38.84383 37.272460 C Turkey
# 4 -84.04949 7.118878 D <NA>
# 5 20.98272 40.920470 E Macedonia
# 6 -155.32951 -37.612497 F <NA>
# 7 99.40166 38.630049 G China
# 8 -61.84025 -27.412885 H Argentina
# 9 -37.65287 -3.666080 I <NA>
# 10 -112.81197 59.959475 J Canada
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With