I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website
Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.
I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:
Attempt #1 (using RCurl):
url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
submit = "Show Prices",
priceDate.year = 2014,
priceDate.month = 12,
priceDate.day = 15,
.opts = curlOptions(ssl.verifypeer = FALSE))
This results in a web page being returned and stored in td.html
but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.
Attempt #2 (using rvest):
s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)
Unfortunately, this approach doesn't even leave R and results in the following error message from R:
Submitting with 'submit'
Error in function (type, msg, asError = TRUE) : <url> malformed
I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.
Any suggestions or tips to solving this seeming simple task would be greatly appreciated!
The method attribute specifies how to send form-data (the form-data is sent to the page specified in the action attribute). The form-data can be sent as URL variables (with method="get" ) or as HTTP post transaction (with method="post" ). Notes on GET: Appends form-data into the URL in name/value pairs.
The HTML form action attribute defines where to send the form data when a form is submitted in an HTML document.
The Correct Answer is " Request. Form". The Request. Form command is used to collect values in a form with method="post".
Well, it appears to work with the httr
library.
library(httr)
url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
fd <- list(
submit = "Show Prices",
priceDate.year = 2014,
priceDate.month = 12,
priceDate.day = 15
)
resp<-POST(url, body=fd, encode="form")
content(resp)
The rvest
library is really just a wrapper to httr
. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look at
f1$url
# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htm
you see that it just has the path and not the server name. This appears to be confusing httr
. If you do
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
f1$url <- url
test <- submit_form(s, f1)
that seems to work. Perhaps it's a bug that should be reported to rvest
. (Tested on rvest_0.1.0
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With