I am trying to get R to complete the 'Search by postcode' field on this webpage http://cti.voa.gov.uk/cti/ with predefined text (e.g. BN1 1NA), advance to the next page and scrape the resulting 4 column table, which, depending on the postcode, can be over multiple pages. To make it more complex the 'Improvement indicator' is not a text field, rather an image file (as seen if you search with postcode BN1 3HP). I would prefer this column to either contain a 0 or 1 depending on if the image is present.
Ultimately I am after a nice data frame that mirrors the 4 columns on screen.
I have tried to modify the suggestions from this question to do what I have described above with no luck, and to be honest I am out of my depth trying to decipher this one.
I realise R may not be the most suited for what I need to do, but it's all I have available to me. Any help would be greatly appreciated.
I'm not sure what the T&C of the VOA website have to say about scraping, but this code will do the job:
library("httr")
library("rvest")
post_code <- "B1 1"
resp <- POST("http://cti.voa.gov.uk/cti/InitS.asp?lcn=0",
encode = "form",
body = list(btnPush = 1,
txtPageNum = 0,
txtPostCode = post_code,
txtRedirectTo = "InitS.asp",
txtStartKey = 0))
resp_cont <- read_html(resp)
council_table <- resp_cont %>%
html_node(".scl_complex table") %>%
html_table
Firebug has an excellent 'Net' panel where the POST headers can be seen. Most modern browsers also have something similar built in.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With