How to parse javascript data list with R

Question

I use R to parse html code, and I would like to know the most efficient way to sparse the following code :

<script type="text/javascript">
var utag_data = {
  environnement : "prod",
  device : getDevice(),
  displaytype : getDisplay($(window).innerWidth()),
  pagename : "adview",
  pagetype : "annonce"}</script>

I started to do this:

infos = unlist(xpathApply(page,
                          '//script[@type="text/javascript"]',
                          xmlValue))
infos=gsub('
|  ','',infos)
infos=gsub("var utag_data = ","",infos)
fromJSON(infos)

And the code above returns somthing really weird:

$nvironnemen
[1] "prod"

$evic
NULL

$isplaytyp
NULL

$agenam
[1] "adview" etc.

I would like to know how to do it very efficient way: how to parse directly the data list in the javascript ? Thank you.

hrbrmstr · Accepted Answer

I didn't try your code, but I think your gsub() regexes might be overagressive (which is prbly causing the name munging).

It's possible to run javascript code using the V8 package, but it wont be able to execute the DOM-based getDevice() and getDisplay() functions since they don't exist in the V8 engine:

library(V8)
library(rvest)

pg <- read_html('<script type="text/javascript">
var utag_data = {
  environnement : "prod",
  device : getDevice(),
  displaytype : getDisplay($(window).innerWidth()),
  pagename : "adview",
  pagetype : "annonce"}</script>')


script <- html_text(html_nodes(pg, xpath='//script[@type="text/javascript"]'))

ctx <- v8()

ctx$eval(script)
## Error: ReferenceError: getDevice is not defined

However, you can compensate for that:

# we need to remove the function calls and replace them with blanks
# since both begin with 'getD' this is pretty easy:
script <- gsub("getD[[:alpha:]\$\.]+,", "'',", script)  

ctx$eval(script)
ctx$get("utag_data")

## $environnement
## [1] "prod"
## 
## $device
## [1] ""
## 
## $displaytype
## [1] ""
## 
## $pagename
## [1] "adview"
## 
## $pagetype
## [1] "annonce"

How to parse javascript data list with R

Tags:

javascript

r

web-scraping

John Smith

1 Answers

hrbrmstr

Recent Activity

Donate For Us

How to parse javascript data list with R

Tags:

javascript

r

web-scraping

John Smith

1 Answers

hrbrmstr

Related questions

Recent Activity

Donate For Us