Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

zillow api with R - XML issue

Tags:

r

xml

zillow

I'm trying to read information from the Zillow API and am running into some data structure issues in R. My outputs are supposed to be xml and appear to be, but aren't behaving like xml.

Specifically, the object that GetSearchResults() returns to me is in a format similar to XML, but not quite right to read in R's XML reading functions.

Can you tell me how I should approach this?

#set directory
setwd('[YOUR DIRECTORY]')

# setup libraries
library(dplyr)
library(XML)
library(ZillowR)
library(RCurl)

# setup api key
set_zillow_web_service_id('[YOUR API KEY]')

xml = GetSearchResults(address = '120 East 7th Street', citystatezip = '10009')
data = xmlParse(xml)

This throws the following error:

Error: XML content does not seem to be XML

The Zillow API documentation clearly states that the output should be XML, and it certainly looks like it. I'd like to be able to easily access various components of the API output for larger-scale data manipulation / aggregation. Let me know if you have any ideas.

like image 742
AME Avatar asked Mar 11 '23 11:03

AME


1 Answers

This was a fun opportunity for me to get acquainted with the Zillow API. My approach, following How to parse XML to R data frame, was to convert the response to a list, for ease of inspection. The onerous bit was figuring out the structure of the data through inspecting the list, particularly because each property might have some missing data. This was why I wrote the getValRange function to deal with parsing the Zestimate data.

results <- xmlToList(xml$response[["results"]])

getValRange <- function(x, hilo) {
  ifelse(hilo %in% unlist(dimnames(x)), x["text",hilo][[1]], NA)
}

out <- apply(results, MAR=2, function(property) {
  zpid <- property$zpid
  links <- unlist(property$links)
  address <- unlist(property$address)
  z <- property$zestimate
  zestdf <- list(
    amount=ifelse("text" %in% names(z$amount), z$amount$text, NA),
    lastupdated=z$"last-updated",
    valueChange=ifelse(length(z$valueChange)==0, NA, z$valueChange),
    valueLow=getValRange(z$valuationRange, "low"),
    valueHigh=getValRange(z$valuationRange, "high"),
    percentile=z$percentile)  
  list(id=zpid, links, address, zestdf)
})

data <- as.data.frame(do.call(rbind, lapply(out, unlist)), 
  row.names=seq_len(length(out)))

Sample output:

> data[,c("id", "street", "zipcode", "amount")]
          id              street zipcode  amount
1 2098001736 120 E 7th St APT 5A   10009 2321224
2 2101731413 120 E 7th St APT 1B   10009 2548390
3 2131798322 120 E 7th St APT 5B   10009 2408860
4 2126480070 120 E 7th St APT 1A   10009 2643454
5 2125360245 120 E 7th St APT 2A   10009 1257602
6 2118428451 120 E 7th St APT 4A   10009    <NA>
7 2125491284 120 E 7th St FRNT 1   10009    <NA>
8 2126626856 120 E 7th St APT 2B   10009 2520587
9 2131542942 120 E 7th St APT 4B   10009 1257676
like image 194
Weihuang Wong Avatar answered Mar 24 '23 16:03

Weihuang Wong