I am struggling with the following issue
I have downloaded the PLUTO NYC Manhattan
Shapefile
for the NYC tax lots from here https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page
I am able to read them in sf
with a simple st_read
> mydf
Simple feature collection with 42638 features and 90 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: 971045.3 ymin: 188447.4 xmax: 1010027 ymax: 259571.5
epsg (SRID): NA
proj4string: +proj=lcc +lat_1=40.66666666666666 +lat_2=41.03333333333333 +lat_0=40.16666666666666 +lon_0=-74 +x_0=300000 +y_0=0 +datum=NAD83 +units=us-ft +no_defs
First 10 features:
Borough Block Lot CD CT2010 CB2010 SchoolDist Council ZipCode FireComp PolicePrct HealthCent HealthArea
1 MN 1545 52 108 138 4000 02 5 10028 E022 19 13 3700
My problem is the following: I have a dataframe as follows
> data_frame('lat' = c(40.785091,40.785091), 'lon' = c(-73.968285, -73.968285))
# A tibble: 2 x 2
lat lon
<dbl> <dbl>
1 40.785091 -73.968285
2 40.785091 -73.968285
I would like to merge this data to the mydf
dataframe above, so that I can count how many latitude/longitude observations I have within each tax lot (remember, mydf
is at the tax lot granularity), and plot the corresponding map of it. I need to do so using sf
.
In essence something similar to
pol <- mydf %>% select(SchoolDist)
plot(pol)
but where the counts for each tax lot come from counting how many points in my latitude/longitude dataframe fall into them.
Of course, in my small example I just have 2 points in the same tax lot, so that would just highlight one single tax lot in the whole area. My real data contains a lot more points.
I think there is an easy way to do it, but I was not able to find it. Thanks!
As a sanity check, the expected length of the merged DataFrame should be longer than or equal to the length of the longer DataFrame. The merged DataFrame df_merged has a total of seven rows: four from both, one from left only, and two from right only as indicated in the column _merge.
The geopandas constructor expects a geometry column which can consist of shapely geometry objects, so the column we created is just fine: To dump this GeoDataFrame into a shapefile, use geopandas' to_file () method (other drivers supported by Fiona such as GeoJSON should also work):
Now, convert the pandas DataFrame into a GeoDataFrame. The geopandas constructor expects a geometry column which can consist of shapely geometry objects, so the column we created is just fine: To dump this GeoDataFrame into a shapefile, use geopandas' to_file () method (other drivers supported by Fiona such as GeoJSON should also work):
In general, it is recommended to use the merge () method called from the spatial dataset. With that said, the stand-alone pandas.merge () function will work if the GeoDataFrame is in the left argument; if a DataFrame is in the left argument and a GeoDataFrame is in the right position, the result will no longer be a GeoDataFrame.
This is how I would do it with arbitrary polygon and point data. I wouldn't merge the two and instead just use a geometry predicate to get the counts that you want. Here we:
nc
dataset and transform to 3857
crs, which is projected rather than lat-long (avoids a warning in st_contains
)nc
, using st_bbox
and runif
. Note that st_as_sf
can turn a data.frame with lat long columns into sf
points.lengths(st_contains(polygons, points)
to get the counts of points per polygon. sgbp
objects created by a geometry predicate are basically "for each geometry in sf
x, what indices of geometries in sf
y satisfy the predicate". So lengths1
effectively gives the number of points that satisfy the predicate for each geometry, in this case number of points contained within each polygon.sf
object as a column, we can just select
and plot them with the plot.sf
method.For your data, simply replace nc
with mydf
and leave out the call to tibble
, instead use your data.frame
with the right lat long pairs.
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.1, GDAL 2.2.3, proj.4 4.9.3
nc <- system.file("shape/nc.shp", package="sf") %>%
read_sf() %>%
st_transform(3857)
set.seed(1000)
points <- tibble(
x = runif(1000, min = st_bbox(nc)[1], max = st_bbox(nc)[3]),
y = runif(1000, min = st_bbox(nc)[2], max = st_bbox(nc)[4])
) %>%
st_as_sf(coords = c("x", "y"), crs = 3857)
plot(nc$geometry)
plot(points$geometry, add = TRUE)
nc %>%
mutate(pt_count = lengths(st_contains(nc, points))) %>%
select(pt_count) %>%
plot()
Created on 2018-05-02 by the reprex package (v0.2.0).
I tried this on your data, but the intersection is empty for the both sets of points you provided. However, the code should work.
EDIT: Simplified group_by
+ mutate
with add_count
:
mydf = st_read("MN_Dcp_Mappinglot.shp")
xydf = data.frame(lat=c(40.758896,40.758896), lon=c(-73.985130, -73.985130))
xysf = st_as_sf(xydf, coords=c('lon', 'lat'), crs=st_crs(mydf))
## NB: make sure to st_transform both to common CRS, as Calum You suggests
xysf %>%
sf::st_intersection(mydf) %>%
dplyr::add_count(LOT)
Reproducible example:
nc = sf::st_read(system.file("shape/nc.shp", package="sf"))
ncxy = sf::st_as_sf(data.frame(lon=c(-80, -80.1, -82), lat=c(35.5, 35.5, 35.5)),
coords=c('lon', 'lat'), crs=st_crs(nc))
ncxy = ncxy %>%
sf::st_intersection(nc) %>%
dplyr::add_count(FIPS)
## a better approach
ncxy = ncxy %>%
sf::st_join(nc, join=st_intersects) %>%
dplyr::add_count(FIPS)
The new column n
includes the total number of points per FIPS
code.
ncxy %>% dplyr::group_by(FIPS) %>% dplyr::distinct(n)
> although coordinates are longitude/latitude, st_intersects assumes
that they are planar
# A tibble: 2 x 2
# Groups: FIPS [2]
FIPS n
<fctr> <int>
1 37123 2
2 37161 1
I'm not sure why your data results in an empty intersection, but since the code works on the example above there must be a separate issue.
HT: st_join
approach from this answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With