I'm trying to scrape table data on a webpage using R (package <code>rvest</code>). To do that, the data needs to be in the html source file (that's where <code>rvest</code> looks for it apparently), but in this case it isn't. However, data elements are shown in the Inspect panel's Elements view: <img src="https://i.stack.imgur.com/kzWfr.png" alt="The table's elements are shown in the Elements view of the Inspect panel"> Source file shows an empty table: <img src="https://i.stack.imgur.com/WxNDP.png" alt="view-source file shows an empty table"> Why is the data shown on inspect element but not on the source file? How can I acces the table data on html format? If I can't access through html how do I change my web scraping strategy? *The web page is https://si3.bcentral.cl/siete/secure/cuadros/cuadro_dinamico.aspx?idMenu=IPC_VAR_MEN1_HIST&codCuadro=IPC_VAR_MEN1_HIST Source file: view-source:https://si3.bcentral.cl/siete/secure/cuadros/cuadro_dinamico.aspx?idMenu=IPC_VAR_MEN1_HIST&codCuadro=IPC_VAR_MEN1_HIST <hr> EDIT: a solution using R is appreciated

I rly wish 'experts' would stop with the "you need Selenium/Headless Chrome" since it's almost never true and introduces a needless, heavyweight third-party dependency into data science workflows. The site is an ASP.NET site so it makes heavy use of sessions and the programmers behind this particular one force that session to start at the home ("Hello, 2000 called and would like their session state preserving model back.") Anyway, we need to start there and progress to your page. Here's what that looks like to your browser: <img src="https://i.stack.imgur.com/dv3c0.png" alt="enter image description here"> We can also see from 👆🏽 is that the site returns lovely JSON so we'll eventually grab that. Let's start modeling an R <code>httr</code> workflow like the above session: <pre class="prettyprint"><code>library(xml2) library(httr) library(rvest) </code></pre> Start at the, um, start! <pre class="prettyprint"><code>httr::GET( url = "https://si3.bcentral.cl/Siete/secure/cuadros/home.aspx", httr::verbose() ) -> res </code></pre> Now, we need to get the HTML from that page as there are a number of hidden values we need to supply to the <code>POST</code> that is made since that's part of how the brain-dead ASP.NET workflow works (again, follow the requests in the image above): <pre class="prettyprint"><code>pg <- httr::content(res) hinput <- html_nodes(pg, "input") hinput <- as.list(setNames(html_attr(hinput, "value"), html_attr(hinput, "name"))) hinput$`header$txtBoxBuscador` <- "" hinput$`__EVENTARGUMENT` <- "" hinput$`__EVENTTARGET` <- "lnkBut01" httr::POST( url = "https://si3.bcentral.cl/Siete/secure/cuadros/home.aspx", httr::add_headers( `Referer` = "https://si3.bcentral.cl/Siete/secure/cuadros/home.aspx" ), encode = "form", body = hinput ) -> res </code></pre> Now we've done what we need to con the website into thinking we've got a proper session so let's make the request for the JSON content: <pre class="prettyprint"><code>httr::GET( url = "https://si3.bcentral.cl/siete/secure/cuadros/actions.aspx", httr::add_headers( `X-Requested-With` = "XMLHttpRequest" ), query = list( Opcion = "1", idMenu = "IPC_VAR_MEN1_HIST", codCuadro = "IPC_VAR_MEN1_HIST", DrDwnAnioDesde = "", DrDwnAnioHasta = "", DrDwnAnioDiario = "", DropDownListFrequency = "", DrDwnCalculo = "NONE" ) ) -> res </code></pre> And, boom: <pre class="prettyprint"><code>str( httr::content(res), 1 ) ## List of 32 ## $ CodigoCuadro : chr "IPC_VAR_MEN1_HIST" ## $ Language : chr "es-CL" ## $ DescripcionCuadro : chr "IPC, IPCX, IPCX1 e IPC SAE, variación mensual, información histórica" ## $ AnioDesde : int 1928 ## $ AnioHasta : int 2018 ## $ FechaInicio : chr "01-01-2010" ## $ FechaFin : chr "01-11-2018" ## $ ListaFrecuencia :List of 1 ## $ FrecuenciaDefecto : NULL ## $ DrDwnAnioDesde :List of 3 ## $ DrDwnAnioHasta :List of 3 ## $ DrDwnAnioDiario :List of 3 ## $ hsDecimales :List of 1 ## $ ListaCalculo :List of 1 ## $ Metadatos : chr " <img runat=\"server\" ID=\"imgButMetaDatos\" alt=\"Ver metadatos\" src=\"../../Images/lens.gif\" OnClick=\"jav"| __truncated__ ## $ NotasPrincipales : chr "" ## $ StatusTextBox : chr "" ## $ Grid :List of 4 ## $ GridColumnNames :List of 113 ## $ Paginador : int 15 ## $ allowEmptyColumns : logi FALSE ## $ FechaInicioSelected: chr "2010" ## $ FechaFinSelected : chr "2018" ## $ FrecuenciaSelected : chr "MONTHLY" ## $ CalculoSelected : chr "NONE" ## $ AnioDiarioSelected : chr "2010" ## $ UrlFechaBase : chr "Indizar_fechaBase.aspx?codCuadro=IPC_VAR_MEN1_HIST" ## $ FechaBaseCuadro : chr "Ene 2010" ## $ IsBoletin : logi FALSE ## $ CheckSelected :List of 4 ## $ lnkButFechaBase : logi FALSE ## $ ShowFechaBase : logi FALSE </code></pre> Dig around in the JSON for the data you need. I think it's in the <code>Grid…</code> elements.

HTML table does not show on source file

Tags:

javascript

html

r

web-scraping

rvest

I'm trying to scrape table data on a webpage using R (package rvest). To do that, the data needs to be in the html source file (that's where rvest looks for it apparently), but in this case it isn't.

However, data elements are shown in the Inspect panel's Elements view:

The table's elements are shown in the Elements view of the Inspect panel

Source file shows an empty table:

view-source file shows an empty table

Why is the data shown on inspect element but not on the source file? How can I acces the table data on html format? If I can't access through html how do I change my web scraping strategy?

*The web page is https://si3.bcentral.cl/siete/secure/cuadros/cuadro_dinamico.aspx?idMenu=IPC_VAR_MEN1_HIST&codCuadro=IPC_VAR_MEN1_HIST

Source file: view-source:https://si3.bcentral.cl/siete/secure/cuadros/cuadro_dinamico.aspx?idMenu=IPC_VAR_MEN1_HIST&codCuadro=IPC_VAR_MEN1_HIST

EDIT: a solution using R is appreciated

808

asked Dec 08 '18 15:12

David Jorquera

2 Answers

I rly wish 'experts' would stop with the "you need Selenium/Headless Chrome" since it's almost never true and introduces a needless, heavyweight third-party dependency into data science workflows.

The site is an ASP.NET site so it makes heavy use of sessions and the programmers behind this particular one force that session to start at the home ("Hello, 2000 called and would like their session state preserving model back.")

Anyway, we need to start there and progress to your page. Here's what that looks like to your browser:

enter image description here

We can also see from 👆🏽 is that the site returns lovely JSON so we'll eventually grab that. Let's start modeling an R httr workflow like the above session:

library(xml2)
library(httr)
library(rvest)

Start at the, um, start!

httr::GET(
  url = "https://si3.bcentral.cl/Siete/secure/cuadros/home.aspx",
  httr::verbose()
) -> res

Now, we need to get the HTML from that page as there are a number of hidden values we need to supply to the POST that is made since that's part of how the brain-dead ASP.NET workflow works (again, follow the requests in the image above):

pg <- httr::content(res)

hinput <- html_nodes(pg, "input")
hinput <- as.list(setNames(html_attr(hinput, "value"), html_attr(hinput, "name")))
hinput$`header$txtBoxBuscador` <- ""
hinput$`__EVENTARGUMENT` <- ""
hinput$`__EVENTTARGET` <- "lnkBut01"

httr::POST(
  url = "https://si3.bcentral.cl/Siete/secure/cuadros/home.aspx",
  httr::add_headers(
    `Referer` = "https://si3.bcentral.cl/Siete/secure/cuadros/home.aspx"
  ),
  encode = "form",
  body = hinput
) -> res

Now we've done what we need to con the website into thinking we've got a proper session so let's make the request for the JSON content:

httr::GET(
  url = "https://si3.bcentral.cl/siete/secure/cuadros/actions.aspx",
  httr::add_headers(
    `X-Requested-With` = "XMLHttpRequest"
  ),
  query = list(
    Opcion = "1",
    idMenu = "IPC_VAR_MEN1_HIST",
    codCuadro = "IPC_VAR_MEN1_HIST",
    DrDwnAnioDesde = "",
    DrDwnAnioHasta = "",
    DrDwnAnioDiario = "",
    DropDownListFrequency = "",
    DrDwnCalculo = "NONE"
  )
) -> res

And, boom:

str(
  httr::content(res), 1
)

## List of 32
##  $ CodigoCuadro       : chr "IPC_VAR_MEN1_HIST"
##  $ Language           : chr "es-CL"
##  $ DescripcionCuadro  : chr "IPC, IPCX, IPCX1 e IPC SAE, variación mensual, información histórica"
##  $ AnioDesde          : int 1928
##  $ AnioHasta          : int 2018
##  $ FechaInicio        : chr "01-01-2010"
##  $ FechaFin           : chr "01-11-2018"
##  $ ListaFrecuencia    :List of 1
##  $ FrecuenciaDefecto  : NULL
##  $ DrDwnAnioDesde     :List of 3
##  $ DrDwnAnioHasta     :List of 3
##  $ DrDwnAnioDiario    :List of 3
##  $ hsDecimales        :List of 1
##  $ ListaCalculo       :List of 1
##  $ Metadatos          : chr " <img runat=\"server\" ID=\"imgButMetaDatos\" alt=\"Ver metadatos\" src=\"../../Images/lens.gif\" OnClick=\"jav"| __truncated__
##  $ NotasPrincipales   : chr ""
##  $ StatusTextBox      : chr ""
##  $ Grid               :List of 4
##  $ GridColumnNames    :List of 113
##  $ Paginador          : int 15
##  $ allowEmptyColumns  : logi FALSE
##  $ FechaInicioSelected: chr "2010"
##  $ FechaFinSelected   : chr "2018"
##  $ FrecuenciaSelected : chr "MONTHLY"
##  $ CalculoSelected    : chr "NONE"
##  $ AnioDiarioSelected : chr "2010"
##  $ UrlFechaBase       : chr "Indizar_fechaBase.aspx?codCuadro=IPC_VAR_MEN1_HIST"
##  $ FechaBaseCuadro    : chr "Ene 2010"
##  $ IsBoletin          : logi FALSE
##  $ CheckSelected      :List of 4
##  $ lnkButFechaBase    : logi FALSE
##  $ ShowFechaBase      : logi FALSE

Dig around in the JSON for the data you need. I think it's in the Grid… elements.

answered Oct 08 '22 18:10

hrbrmstr

The data is more than likely loaded dynamically from a data source or API. You can scrape the filled table by sending a GET request to the web page and scraping the page after the data has been loaded!

167

answered Oct 08 '22 18:10

Brady Ward

Related questions
                            
                                Enum object as dictionary key [duplicate]
                            
                                How to remove focus from text field when user presses ESC angular 5
                            
                                In ReactJS, can you use a loop in Array.fill?
                            
                                What is the correct way to use inline style in react?
                            
                                What is the difference between Webpack-merge and Object.assign()?
                            
                                mock declare const in jasmine test
                            
                                How to test a function is called in react jest?
                            
                                How to merge properties of two JavaScript objects and prefer not null values?
                            
                                Why thead content is not repeating in each and every print page while I try to print the web page?
                            
                                Destructure a nested object, but keep a reference to the nested object [duplicate]
                            
                                How to keep d3.forceSimulation() nodes evenly spread without overlapping
                            
                                Specify node version for eslint
                            
                                how to hide django message after it is displayed for few seconds
                            
                                npm install, -force flag
                            
                                Argument of type [type] is not assignable to parameter of type [type]
                            
                                Angular 6 - changing the variable but doesn't refresh the view
                            
                                React - Fetch from external API function on button click
                            
                                How to return 202 Accepted and then continue processing the request in Nest.js
                            
                                ASP.NET MVC Razor <text> tag not accepting less than or equal to operators
                            
                                How to use v-model with checkbox in vue.js?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With