Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to GeoCode a simple address using Data Science Toolbox

Tags:

r

maps

geocoding

I am fed up with Google's geocoding, and decided to try an alternative. The Data Science Toolkit (http://www.datasciencetoolkit.org) allows you to Geocode unlimited number of addresses. R has an excellent package that serves as a wrapper for its functions (CRAN:RDSTK). The package has a function called street2coordinates() that interfaces with the Data Science Toolkit's geocoding utility.

However, the RDSTK function street2coordinates() does not work if you try to geocode something simple like City, Country. In the following example I will try to use the function to get the latitude and longitude for the city of Phoenix:

> require("RDSTK")
> street2coordinates("Phoenix+Arizona+United+States")
[1] full.address
<0 rows> (or 0-length row.names)

The utility from the data science toolkit works perfectly. This is the URL request that gives the answer: http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address=Phoenix+Arizona+United+States

I am interested in geocoding multiple addresses (which complete addresses and city names). I know that the Data Science Toolkit URL will work well.

How do I interface with the URL and get multiple latitudes and longitudes into a data frame with the addresses?

Here is an sample dataset:

dff <- data.frame(address=c(
  "Birmingham, Alabama, United States",
  "Mobile, Alabama, United States",
  "Phoenix, Arizona, United States",
  "Tucson, Arizona, United States",
  "Little Rock, Arkansas, United States",
  "Berkeley, California, United States",
  "Duarte, California, United States",
  "Encinitas, California, United States",
  "La Jolla, California, United States",
  "Los Angeles, California, United States",
  "Orange, California, United States",
  "Redwood City, California, United States",
  "Sacramento, California, United States",
  "San Francisco, California, United States",
  "Stanford, California, United States",
  "Hartford, Connecticut, United States",
  "New Haven, Connecticut, United States"
  ))
like image 484
Jose R Avatar asked Apr 05 '14 22:04

Jose R


2 Answers

Like this:

library(httr)
library(rjson)

data <- paste0("[",paste(paste0("\"",dff$address,"\""),collapse=","),"]")
url  <- "http://www.datasciencetoolkit.org/street2coordinates"
response <- POST(url,body=data)
json     <- fromJSON(content(response,type="text"))
geocode  <- do.call(rbind,sapply(json,
                                 function(x) c(long=x$longitude,lat=x$latitude)))
geocode
#                                                long      lat
# San Francisco, California, United States -117.88536 35.18713
# Mobile, Alabama, United States            -88.10318 30.70114
# La Jolla, California, United States      -117.87645 33.85751
# Duarte, California, United States        -118.29866 33.78659
# Little Rock, Arkansas, United States      -91.20736 33.60892
# Tucson, Arizona, United States           -110.97087 32.21798
# Redwood City, California, United States  -117.88536 35.18713
# New Haven, Connecticut, United States     -72.92751 41.36571
# Berkeley, California, United States      -122.29673 37.86058
# Hartford, Connecticut, United States      -72.76356 41.78516
# Sacramento, California, United States    -121.55541 38.38046
# Encinitas, California, United States     -116.84605 33.01693
# Birmingham, Alabama, United States        -86.80190 33.45641
# Stanford, California, United States      -122.16750 37.42509
# Orange, California, United States        -117.85311 33.78780
# Los Angeles, California, United States   -117.88536 35.18713

This takes advantage of the POST interface to the street2coordinates API (documented here), which returns all the results in 1 request, rather than using multiple GET requests.

The absence of Phoenix seems to be a bug in the street2coordinates API. If you go the API demo page and try "Phoenix, Arizona, United States", you get a null response. However, as your example shows, using their "Google-style Geocoder" does give a result for Phoenix. So here's a solution using repeated GET requests. Note that this runs much slower.

geo.dsk <- function(addr){ # single address geocode with data sciences toolkit
  require(httr)
  require(rjson)
  url      <- "http://www.datasciencetoolkit.org/maps/api/geocode/json"
  response <- GET(url,query=list(sensor="FALSE",address=addr))
  json <- fromJSON(content(response,type="text"))
  loc  <- json['results'][[1]][[1]]$geometry$location
  return(c(address=addr,long=loc$lng, lat= loc$lat))
}
result <- do.call(rbind,lapply(as.character(dff$address),geo.dsk))
result <- data.frame(result)
result
#                                     address         long        lat
# 1        Birmingham, Alabama, United States   -86.801904  33.456412
# 2            Mobile, Alabama, United States   -88.103184  30.701142
# 3           Phoenix, Arizona, United States -112.0733333 33.4483333
# 4            Tucson, Arizona, United States  -110.970869  32.217975
# 5      Little Rock, Arkansas, United States   -91.207356  33.608922
# 6       Berkeley, California, United States   -122.29673  37.860576
# 7         Duarte, California, United States  -118.298662  33.786594
# 8      Encinitas, California, United States  -116.846046  33.016928
# 9       La Jolla, California, United States  -117.876447  33.857515
# 10   Los Angeles, California, United States  -117.885359  35.187133
# 11        Orange, California, United States  -117.853112  33.787795
# 12  Redwood City, California, United States  -117.885359  35.187133
# 13    Sacramento, California, United States  -121.555406  38.380456
# 14 San Francisco, California, United States  -117.885359  35.187133
# 15      Stanford, California, United States    -122.1675   37.42509
# 16     Hartford, Connecticut, United States   -72.763564   41.78516
# 17    New Haven, Connecticut, United States   -72.927507  41.365709
like image 159
jlhoward Avatar answered Nov 19 '22 05:11

jlhoward


The ggmap package includes support for geocoding using either Google or Data Science Toolkit, the latter with their "Google-style geocoder". This is quite slow for multiple addresses, as noted in the earlier answer.

library(ggmap)
result <- geocode(as.character(dff[[1]]), source = "dsk")
print(cbind(dff, result))
#                                     address        lon      lat
# 1        Birmingham, Alabama, United States  -86.80190 33.45641
# 2            Mobile, Alabama, United States  -88.10318 30.70114
# 3           Phoenix, Arizona, United States -112.07404 33.44838
# 4            Tucson, Arizona, United States -110.97087 32.21798
# 5      Little Rock, Arkansas, United States  -91.20736 33.60892
# 6       Berkeley, California, United States -122.29673 37.86058
# 7         Duarte, California, United States -118.29866 33.78659
# 8      Encinitas, California, United States -116.84605 33.01693
# 9       La Jolla, California, United States -117.87645 33.85751
# 10   Los Angeles, California, United States -117.88536 35.18713
# 11        Orange, California, United States -117.85311 33.78780
# 12  Redwood City, California, United States -117.88536 35.18713
# 13    Sacramento, California, United States -121.55541 38.38046
# 14 San Francisco, California, United States -117.88536 35.18713
# 15      Stanford, California, United States -122.16750 37.42509
# 16     Hartford, Connecticut, United States  -72.76356 41.78516
# 17    New Haven, Connecticut, United States  -72.92751 41.36571
like image 38
mvkorpel Avatar answered Nov 19 '22 07:11

mvkorpel