Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to merge a shapefile with a dataframe with latitude/longitude data

Tags:

r

sf

I am struggling with the following issue

I have downloaded the PLUTO NYC Manhattan Shapefile for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page

I am able to read them in sf with a simple st_read

> mydf
Simple feature collection with 42638 features and 90 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID):    NA
proj4string:    +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
   Borough Block  Lot  CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1       MN  1545   52 108    138   4000         02       5   10028     E022         19         13       3700

My problem is the following: I have a dataframe as follows

> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
        lat        lon
      <dbl>      <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285

I would like to merge this data to the mydf dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf is at the tax lot granularity), and plot the corresponding map of it. I need to do so using sf.

In essence something similar to

pol <- mydf %>% select(SchoolDist)
plot(pol)

enter image description here

but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them.

Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points.

I think there is an easy way to do it, but I was not able to find it. Thanks!

like image 439
ℕʘʘḆḽḘ Avatar asked May 02 '18 17:05

ℕʘʘḆḽḘ


People also ask

What is the expected length of the merged Dataframe?

As a sanity check, the expected length of the merged DataFrame should be longer than or equal to the length of the longer DataFrame. The merged DataFrame df_merged has a total of seven rows: four from both, one from left only, and two from right only as indicated in the column _merge.

How do I dump a geopandas Dataframe into a shapefile?

The geopandas constructor expects a geometry column which can consist of shapely geometry objects, so the column we created is just fine: To dump this GeoDataFrame into a shapefile, use geopandas' to_file () method (other drivers supported by Fiona such as GeoJSON should also work):

How do I convert a pandas Dataframe to a shapefile?

Now, convert the pandas DataFrame into a GeoDataFrame. The geopandas constructor expects a geometry column which can consist of shapely geometry objects, so the column we created is just fine: To dump this GeoDataFrame into a shapefile, use geopandas' to_file () method (other drivers supported by Fiona such as GeoJSON should also work):

How to merge Dataframe and geodataframe in pandas?

In general, it is recommended to use the merge () method called from the spatial dataset. With that said, the stand-alone pandas.merge () function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.


Video Answer


2 Answers

This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we:

  1. Use the built in nc dataset and transform to 3857 crs, which is projected rather than lat-long (avoids a warning in st_contains)
  2. Create 1000 random points within the bounding box of nc, using st_bbox and runif. Note that st_as_sf can turn a data.frame with lat long columns into sf points.
  3. Use lengths(st_contains(polygons, points) to get the counts of points per polygon. sgbp objects created by a geometry predicate are basically "for each geometry in sf x, what indices of geometries in sf y satisfy the predicate". So lengths1 effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.
  4. Once the counts are in the sf object as a column, we can just select and plot them with the plot.sf method.

For your data, simply replace nc with mydf and leave out the call to tibble, instead use your data.frame with the right lat long pairs.

library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
  read_sf() %>%
  st_transform(3857)
set.seed(1000)
points <- tibble(
  x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
  y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
  st_as_sf(coords = c("x", "y"), crs = 3857)

plot(nc$geometry)
plot(points$geometry, add = TRUE)

nc %>%
  mutate(pt_count = lengths(st_contains(nc, points))) %>%
  select(pt_count) %>%
  plot()

Created on 2018-05-02 by the reprex package (v0.2.0).

like image 114
Calum You Avatar answered Oct 08 '22 17:10

Calum You


I tried this on your data, but the intersection is empty for the both sets of points you provided. However, the code should work.

EDIT: Simplified group_by + mutate with add_count:

mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>% 
    sf::st_intersection(mydf) %>% 
    dplyr::add_count(LOT)

Reproducible example:

nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)), 
           coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>% 
           sf::st_intersection(nc) %>%
           dplyr::add_count(FIPS)

## a better approach
ncxy = ncxy %>%
           sf::st_join(nc, join=st_intersects) %>%
           dplyr::add_count(FIPS)

The new column n includes the total number of points per FIPS code.

ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes 
  that they are planar
  # A tibble: 2 x 2
  # Groups:   FIPS [2]
    FIPS     n
   <fctr> <int>
  1  37123     2
  2  37161     1

I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue.

HT: st_join approach from this answer.

like image 39
juan Avatar answered Oct 08 '22 18:10

juan