This code scrapes from here http://www.bls.gov/schedule/news_release/2015_sched.htm every Date that contains Employment Situation under the Release column.
pg <- read_html("http://www.bls.gov/schedule/news_release/2015_sched.htm")
# target only the <td> elements under the bodytext div
body <- html_nodes(pg, "div#bodytext")
# we use this new set of nodes and a relative XPath to get the initial <td> elements, then get their siblings
es_nodes <- html_nodes(body, xpath=".//td[contains(., 'Employment Situation for')]/../td[1]")
# clean up the cruft and make our dates!
nfpdates2015 <- as.Date(trimws(html_text(es_nodes)), format="%A, %B %d, %Y")
###thanks @hrbrmstr for this###
I would like to repeat that for other URLs, containing other years, named in the same way with only the year number changing. Particularly, for the following URLs:
#From 2008 to 2015
http://www.bls.gov/schedule/news_release/2015_sched.htm
http://www.bls.gov/schedule/news_release/2014_sched.htm
...
http://www.bls.gov/schedule/news_release/2008_sched.htm
My knowledge of rvest
, HTML
and XML
is almost non-existent. I thought to apply the same code with a for loop, but my efforts were futile. Of course I could just repeat the code for 2015 eight times to get all years, it would neither take too long nor too much space. Yet I am very curious to know how this could be done in a more efficient way. Thank you.
In a loop you would change the url
string using a paste0
statment
for(i in 2008:2015){
url <- paste0("http://www.bls.gov/schedule/news_release/", i, "_sched.htm")
pg <- read_html(url)
## all your other code goes here.
}
Or using an lapply
to return a list of the results.
lst <- lapply(2008:2015, function(x){
url <- paste0("http://www.bls.gov/schedule/news_release/", x, "_sched.htm")
## all your other code goes here.
pg <- read_html(url)
# target only the <td> elements under the bodytext div
body <- html_nodes(pg, "div#bodytext")
# we use this new set of nodes and a relative XPath to get the initial <td> elements, then get their siblings
es_nodes <- html_nodes(body, xpath=".//td[contains(., 'Employment Situation for')]/../td[1]")
# clean up the cruft and make our dates!
nfpdates <- as.Date(trimws(html_text(es_nodes)), format="%A, %B %d, %Y")
return(nfpdates)
})
Which returns
lst
[[1]]
[1] "2008-01-04" "2008-02-01" "2008-03-07" "2008-04-04" "2008-05-02" "2008-06-06" "2008-07-03" "2008-08-01" "2008-09-05"
[10] "2008-10-03" "2008-11-07" "2008-12-05"
[[2]]
[1] "2009-01-09" "2009-02-06" "2009-03-06" "2009-04-03" "2009-05-08" "2009-06-05" "2009-07-02" "2009-08-07" "2009-09-04"
[10] "2009-10-02" "2009-11-06" "2009-12-04"
## etc...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With