Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using R to add field to online form and scrape resulting javascript created table

I am trying to get R to complete the 'Search by postcode' field on this webpage http://cti.voa.gov.uk/cti/ with predefined text (e.g. BN1 1NA), advance to the next page and scrape the resulting 4 column table, which, depending on the postcode, can be over multiple pages. To make it more complex the 'Improvement indicator' is not a text field, rather an image file (as seen if you search with postcode BN1 3HP). I would prefer this column to either contain a 0 or 1 depending on if the image is present.

Ultimately I am after a nice data frame that mirrors the 4 columns on screen.

I have tried to modify the suggestions from this question to do what I have described above with no luck, and to be honest I am out of my depth trying to decipher this one.

I realise R may not be the most suited for what I need to do, but it's all I have available to me. Any help would be greatly appreciated.

like image 588
Chris Avatar asked Jul 08 '15 14:07

Chris


1 Answers

I'm not sure what the T&C of the VOA website have to say about scraping, but this code will do the job:

library("httr")
library("rvest")
post_code <- "B1 1"
resp <- POST("http://cti.voa.gov.uk/cti/InitS.asp?lcn=0",
             encode = "form",
             body = list(btnPush = 1,
                         txtPageNum = 0,
                         txtPostCode = post_code,
                         txtRedirectTo = "InitS.asp",
                         txtStartKey = 0))
resp_cont <- read_html(resp)
council_table <- resp_cont %>%
  html_node(".scl_complex table") %>%
  html_table

Firebug has an excellent 'Net' panel where the POST headers can be seen. Most modern browsers also have something similar built in.

like image 130
Nick Kennedy Avatar answered Nov 10 '22 13:11

Nick Kennedy