I am trying to scrape this website. Unfortunately, the data that I want to scrape using rvest is hidden behind a button (the plus symbol).
I tried to do it with the rvest package and I use the following code:
library(rvest)
url <- 'https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/show?name=&defaultValue=true&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&dateTime.endDateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&area.values=CTY|10YBE----------2!BZN|10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&productionType.values=B20&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable_length=100'
htmlpage <- html_session(url) %>%
read_html() %>%
html_nodes(".dv-value-cell") %>>%
html_table()
The ".dv-value-cell" is extracted from the website using the SelectorGadget (in one of the vignettes of rvest).
However, before I can use this code, I still need to open the plus menu. The data inside this sub table doesn't exist before clicking the button. Therefore, the code above will return an empty value.
I used the Chrome web development tools described in this question to monitor what happens when I click on the button. According to that information, I see that there is a request to the following url (shortened to only highlight the difference with the original url):
https://transparency.entsoe.eu/...&dateTime.timezone_input=UTC&dv-datatable-detail_22WAMERCO000010Y_22WAMERCO000008L_length=10&dv-datatable_length=50&detailId=22WAMERCO000010Y_22WAMERCO000008L
As you can see, this is the original url, but there is a small additional request. However, when I try this url in my browser, it doesn't show the desired result. I must be missing something that the website passes additionally.
The result of this request according to Chrome is exactly the data that I'm looking for (right-click > copy > copy result). So there should be a way to just download this specific data.
I also found this question about a similar problem, but unfortunately the solution is quite specific for this case and misses a general explanation.
How can I reproduce this browser request such that I receive the same table?
If you are not scraping a large set of data. I will suggest to you to use selenium
. With selenium actually you can click the button. You can begin with scraping with R programming and selenium.
You can also use PhantomJS. It is also like selenium but no browser required.
I hope one of them will help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With