Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I POST a simple HTML form in R?

I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website

Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.

I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:

Attempt #1 (using RCurl):

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
                    submit = "Show Prices",
                    priceDate.year  = 2014,
                    priceDate.month = 12,
                    priceDate.day   = 15,
                   .opts = curlOptions(ssl.verifypeer = FALSE))

This results in a web page being returned and stored in td.html but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.

Attempt #2 (using rvest):

s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)

Unfortunately, this approach doesn't even leave R and results in the following error message from R:

Submitting with 'submit'
Error in function (type, msg, asError = TRUE)  : <url> malformed

I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.

Any suggestions or tips to solving this seeming simple task would be greatly appreciated!

like image 688
Daddy the Runner Avatar asked Dec 24 '14 04:12

Daddy the Runner


People also ask

How do you post a form in HTML?

The method attribute specifies how to send form-data (the form-data is sent to the page specified in the action attribute). The form-data can be sent as URL variables (with method="get" ) or as HTTP post transaction (with method="post" ). Notes on GET: Appends form-data into the URL in name/value pairs.

What is form action in HTML?

The HTML form action attribute defines where to send the form data when a form is submitted in an HTML document.

How do you get information from a form that is submitted using the POST method?

The Correct Answer is " Request. Form". The Request. Form command is used to collect values in a form with method="post".


1 Answers

Well, it appears to work with the httr library.

library(httr)

url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"

fd <- list(
    submit = "Show Prices",
    priceDate.year  = 2014,
    priceDate.month = 12,
    priceDate.day   = 15
)

resp<-POST(url, body=fd, encode="form")
content(resp)

The rvest library is really just a wrapper to httr. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look at

f1$url
# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htm

you see that it just has the path and not the server name. This appears to be confusing httr. If you do

f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
f1$url <- url
test <- submit_form(s, f1)

that seems to work. Perhaps it's a bug that should be reported to rvest. (Tested on rvest_0.1.0)

like image 67
MrFlick Avatar answered Oct 20 '22 02:10

MrFlick