I am struggling with the following issue I have downloaded the <code>PLUTO NYC Manhattan</code> <code>Shapefile</code> for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page I am able to read them in <code>sf</code> with a simple <code>st_read</code> <pre class="prettyprint"><code>> mydf Simple feature collection with 42638 features and 90 fields geometry type: MULTIPOLYGON dimension: XY bbox: xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5 epsg (SRID): NA proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs First 10 features: Borough Block Lot CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea 1 MN 1545 52 108 138 4000 02 5 10028 E022 19 13 3700 </code></pre> My problem is the following: I have a dataframe as follows <pre class="prettyprint"><code>> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285)) # A tibble: 2 x 2 lat lon <dbl> <dbl> 1 40.785091 -73.968285 2 40.785091 -73.968285 </code></pre> I would like to merge this data to the <code>mydf</code> dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, <code>mydf</code> is at the tax lot granularity), and plot the corresponding map of it. I need to do so using <code>sf</code>. In essence something similar to <pre class="prettyprint"><code>pol <- mydf %>% select(SchoolDist) plot(pol) </code></pre> <img src="https://i.stack.imgur.com/5f1KB.png" alt="enter image description here"> but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them. Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points. I think there is an easy way to do it, but I was not able to find it. Thanks!

This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we: <ol> <li>Use the built in <code>nc</code> dataset and transform to <code>3857</code> crs, which is projected rather than lat-long (avoids a warning in <code>st_contains</code>)</li> <li>Create 1000 random points within the bounding box of <code>nc</code>, using <code>st_bbox</code> and <code>runif</code>. Note that <code>st_as_sf</code> can turn a data.frame with lat long columns into <code>sf</code> points.</li> <li>Use <code>lengths(st_contains(polygons, points)</code> to get the counts of points per polygon. <code>sgbp</code> objects created by a geometry predicate are basically "for each geometry in <code>sf</code> x, what indices of geometries in <code>sf</code> y satisfy the predicate". So <code>lengths1</code> effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.</li> <li>Once the counts are in the <code>sf</code> object as a column, we can just <code>select</code> and plot them with the <code>plot.sf</code> method.</li> </ol> For your data, simply replace <code>nc</code> with <code>mydf</code> and leave out the call to <code>tibble</code>, instead use your <code>data.frame</code> with the right lat long pairs. <pre class="prettyprint lang-r prettyprint-override"><code>library(tidyverse) library(sf) #> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3 nc <- system.file("shape/nc.shp", package="sf") %>% read_sf() %>% st_transform(3857) set.seed(1000) points <- tibble( x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]), y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4]) ) %>% st_as_sf(coords = c("x", "y"), crs = 3857) plot(nc$geometry) plot(points$geometry, add = TRUE) </code></pre> <img src="https://i.imgur.com/iEyQ9Tz.png" alt=""> <pre class="prettyprint lang-r prettyprint-override"><code>nc %>% mutate(pt_count = lengths(st_contains(nc, points))) %>% select(pt_count) %>% plot() </code></pre> <img src="https://i.imgur.com/PWW36ed.png" alt=""> Created on 2018-05-02 by the reprex package (v0.2.0).

how to merge a shapefile with a dataframe with latitude/longitude data

Tags:

r

sf

I am struggling with the following issue

I have downloaded the PLUTO NYC Manhattan Shapefile for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page

I am able to read them in sf with a simple st_read

> mydf
Simple feature collection with 42638 features and 90 fields
geometry type:  MULTIPOLYGON
dimension:      XY
bbox:           xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID):    NA
proj4string:    +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
   Borough Block  Lot  CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1       MN  1545   52 108    138   4000         02       5   10028     E022         19         13       3700

My problem is the following: I have a dataframe as follows

> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
        lat        lon
      <dbl>      <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285

I would like to merge this data to the mydf dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf is at the tax lot granularity), and plot the corresponding map of it. I need to do so using sf.

In essence something similar to

pol <- mydf %>% select(SchoolDist)
plot(pol)

enter image description here

but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them.

Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points.

I think there is an easy way to do it, but I was not able to find it. Thanks!

439

asked May 02 '18 17:05

ℕʘʘḆḽḘ

Video Answer

2 Answers

This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we:

Use the built in nc dataset and transform to 3857 crs, which is projected rather than lat-long (avoids a warning in st_contains)
Create 1000 random points within the bounding box of nc, using st_bbox and runif. Note that st_as_sf can turn a data.frame with lat long columns into sf points.
Use lengths(st_contains(polygons, points) to get the counts of points per polygon. sgbp objects created by a geometry predicate are basically "for each geometry in sf x, what indices of geometries in sf y satisfy the predicate". So lengths1 effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.
Once the counts are in the sf object as a column, we can just select and plot them with the plot.sf method.

For your data, simply replace nc with mydf and leave out the call to tibble, instead use your data.frame with the right lat long pairs.

library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
  read_sf() %>%
  st_transform(3857)
set.seed(1000)
points <- tibble(
  x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
  y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
  st_as_sf(coords = c("x", "y"), crs = 3857)

plot(nc$geometry)
plot(points$geometry, add = TRUE)

nc %>%
  mutate(pt_count = lengths(st_contains(nc, points))) %>%
  select(pt_count) %>%
  plot()

Created on 2018-05-02 by the reprex package (v0.2.0).

114

answered Oct 08 '22 17:10

Calum You

I tried this on your data, but the intersection is empty for the both sets of points you provided. However, the code should work.

EDIT: Simplified group_by + mutate with add_count:

mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>% 
    sf::st_intersection(mydf) %>% 
    dplyr::add_count(LOT)

Reproducible example:

nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)), 
           coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>% 
           sf::st_intersection(nc) %>%
           dplyr::add_count(FIPS)

## a better approach
ncxy = ncxy %>%
           sf::st_join(nc, join=st_intersects) %>%
           dplyr::add_count(FIPS)

The new column n includes the total number of points per FIPS code.

ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes 
  that they are planar
  # A tibble: 2 x 2
  # Groups:   FIPS [2]
    FIPS     n
   <fctr> <int>
  1  37123     2
  2  37161     1

I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue.

HT: st_join approach from this answer.

answered Oct 08 '22 18:10

juan

Related questions
                            
                                mc.cores > 1 is not support on windows
                            
                                Rename columns using `starts_with()` where new prefix is a string
                            
                                dplyr: deselecting columns given by
                            
                                Convert number of days since Jan 1 2000 into date format
                            
                                reshape/melt an asymmetric matrix according to a rowKey
                            
                                is.atomic() vs is.vector()
                            
                                dplyr::select_if can use colnames and their values at the same time?
                            
                                Replace NA in all columns of a dplyr chain
                            
                                Get column names with zero variance using dplyr
                            
                                Extract city names from large text with R
                            
                                Extract portion of string startswith 4 digit number and ends with period
                            
                                Extract first sentence in string
                            
                                How to convert list of -sf dataframes into single dataframe with geometry per row in R?
                            
                                Getting Stargazer Column labels to print on two or three lines?
                            
                                Extract columns from data table by numeric indices stored in a vector
                            
                                R Violin plots and boxplots together, make fill behave differently only for boxplots
                            
                                Error in browseVignettes: no vignettes found
                            
                                Facet_Wrap labels in R
                            
                                Stuck with definition of S3 method for autoplot
                            
                                R does not report error when an argument of a function is not provided but used for subsetting a vector

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With