Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape website that requires button click

I am trying to scrape this website. Unfortunately, the data that I want to scrape using rvest is hidden behind a button (the plus symbol).

I tried to do it with the rvest package and I use the following code:

library(rvest)
url <- 'https://transparency.entsoe.eu/generation/r2/actualGenerationPerGenerationUnit/show?name=&defaultValue=true&viewType=TABLE&areaType=BZN&atch=false&dateTime.dateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&dateTime.endDateTime=17.03.2017+00:00|UTC|DAYTIMERANGE&area.values=CTY|10YBE----------2!BZN|10YBE----------2&productionType.values=B02&productionType.values=B03&productionType.values=B04&productionType.values=B05&productionType.values=B06&productionType.values=B07&productionType.values=B08&productionType.values=B09&productionType.values=B10&productionType.values=B11&productionType.values=B12&productionType.values=B13&productionType.values=B14&productionType.values=B15&productionType.values=B16&productionType.values=B17&productionType.values=B18&productionType.values=B19&productionType.values=B20&dateTime.timezone=UTC&dateTime.timezone_input=UTC&dv-datatable_length=100'

htmlpage <- html_session(url) %>%
  read_html() %>% 
  html_nodes(".dv-value-cell") %>>%
  html_table()

The ".dv-value-cell" is extracted from the website using the SelectorGadget (in one of the vignettes of rvest).

However, before I can use this code, I still need to open the plus menu. The data inside this sub table doesn't exist before clicking the button. Therefore, the code above will return an empty value.

I used the Chrome web development tools described in this question to monitor what happens when I click on the button. According to that information, I see that there is a request to the following url (shortened to only highlight the difference with the original url):

https://transparency.entsoe.eu/...&dateTime.timezone_input=UTC&dv-datatable-detail_22WAMERCO000010Y_22WAMERCO000008L_length=10&dv-datatable_length=50&detailId=22WAMERCO000010Y_22WAMERCO000008L

As you can see, this is the original url, but there is a small additional request. However, when I try this url in my browser, it doesn't show the desired result. I must be missing something that the website passes additionally.

The result of this request according to Chrome is exactly the data that I'm looking for (right-click > copy > copy result). So there should be a way to just download this specific data.

I also found this question about a similar problem, but unfortunately the solution is quite specific for this case and misses a general explanation.

How can I reproduce this browser request such that I receive the same table?

like image 802
takje Avatar asked Oct 30 '22 10:10

takje


1 Answers

If you are not scraping a large set of data. I will suggest to you to use selenium. With selenium actually you can click the button. You can begin with scraping with R programming and selenium.

You can also use PhantomJS. It is also like selenium but no browser required.
I hope one of them will help.

like image 126
Harun ERGUL Avatar answered Nov 15 '22 07:11

Harun ERGUL