Scrape website's Power BI dashboard using R

Tags:

I have been trying to scrape my local government's Power BI dashboard using R but it seems like it might be impossible. I've read from the Microsoft site that it is not possible to scrable Power BI dashboards but I am going through several forums showing that it is possible, however I am going through a loop

I am trying to scrape the Zip Code tab data from this dashboard:

https://app.powerbigov.us/view?r=eyJrIjoiZDFmN2ViMGEtNzQzMC00ZDU3LTkwZjUtOWU1N2RiZmJlOTYyIiwidCI6IjNiMTg1MTYzLTZjYTMtNDA2NS04NDAwLWNhNzJiM2Y3OWU2ZCJ9&pageName=ReportSectionb438b98829599a9276e2&pageName=ReportSectionb438b98829599a9276e2

I've tried several "techniques" from the given code below

scc_webpage <- xml2::read_html("https://app.powerbigov.us/view?r=eyJrIjoiZDFmN2ViMGEtNzQzMC00ZDU3LTkwZjUtOWU1N2RiZmJlOTYyIiwidCI6IjNiMTg1MTYzLTZjYTMtNDA2NS04NDAwLWNhNzJiM2Y3OWU2ZCJ9&pageName=ReportSectionb438b98829599a9276e2&pageName=ReportSectionb438b98829599a9276e2")


# Attempt using xpath
scc_webpage %>% 
  rvest::html_nodes(xpath = '//*[@id="pvExplorationHost"]/div/div/exploration/div/explore-canvas-modern/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container-group/transform/div/div[2]/visual-container-modern[1]/transform/div/div[3]/div/visual-modern/div/div/div[2]/div[1]/div[4]/div/div/div[1]/div[1]') %>% 
  rvest::html_text()

# Attempt using div.<class>
scc_webpage %>% 
  rvest::html_nodes("div.pivotTableCellWrap cell-interactive tablixAlignRight ") %>% 
  rvest::html_text()

# Attempt using xpathSapply
query = '//*[@id="pvExplorationHost"]/div/div/exploration/div/explore-canvas-modern/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container-group/transform/div/div[2]/visual-container-modern[1]/transform/div/div[3]/div/visual-modern/div/div/div[2]/div[1]/div[4]/div/div/div[1]/div[1]'
XML::xpathSApply(xml, query, xmlValue)

scc_webpage %>% 
  html_nodes("ui-view")

But I always either get an output saying character(0) when using xpath and getting the div class and id, or even {xml_nodeset (0)} when trying to go through html_nodes. The weird thing is that it wouldn't show the whole html of the tableau data when I do:

scc_webpage %>% 
  html_nodes("div")

And this would be the output, leaving the chunk that I needed blank:

{xml_nodeset (2)}
[1] <div id="pbi-loading"><svg version="1.1" class="pulsing-svg-item" xmlns="http://www.w3.org/2000/svg" xmlns:xlink ...
[2] <div id="pbiAppPlaceHolder">\r\n        <ui-view></ui-view><root></root>\n</div>

I guess the issue may be because the numbers are within a series of nested div attributes??

The main data I am trying to get are the numbers from the table showing the Zip code, confirmed cases, % total cases, deaths, % total deaths.

If this is possible to do in R or possibly in Python using Selenium, any help with this would be greatly appreciated!!

201

asked Oct 09 '20 19:10

Doughey

1 Answers

The problem is that the site you want to analyze relies on JavaScript to run and fetch the content for you. In such a case, httr::GET is of no help to you.
However, since manual work is also not an option, we have Selenium.

The following does what you're looking for:

library(dplyr)
library(purrr)
library(readr)

library(wdman)
library(RSelenium)
library(xml2)
library(selectr)

# using wdman to start a selenium server
selServ <- selenium(
  port = 4444L,
  version = 'latest',
  chromever = '84.0.4147.30', # set this to a chrome version that's available on your machine
)

# using RSelenium to start chrome on the selenium server
remDr <- remoteDriver(
  remoteServerAddr = 'localhost',
  port = 4444L,
  browserName = 'chrome'
)

# open a new Tab on Chrome
remDr$open()

# navigate to the site you wish to analyze
report_url <- "https://app.powerbigov.us/view?r=eyJrIjoiZDFmN2ViMGEtNzQzMC00ZDU3LTkwZjUtOWU1N2RiZmJlOTYyIiwidCI6IjNiMTg1MTYzLTZjYTMtNDA2NS04NDAwLWNhNzJiM2Y3OWU2ZCJ9&pageName=ReportSectionb438b98829599a9276e2&pageName=ReportSectionb438b98829599a9276e2"
remDr$navigate(report_url)

# find and click the button leading to the Zip Code data
zipCodeBtn <- remDr$findElement('.//button[descendant::span[text()="Zip Code"]]', using="xpath")
zipCodeBtn$clickElement()

# fetch the site source in XML
zipcode_data_table <- read_html(remDr$getPageSource()[[1]]) %>%
  querySelector("div.pivotTable")

Now we have the page source read into R, probably what you had in mind when you started your scraping task.
From here on it's smooth sailing and merely about converting that xml to a useable table:

col_headers <- zipcode_data_table %>%
  querySelectorAll("div.columnHeaders div.pivotTableCellWrap") %>%
  map_chr(xml_text)

rownames <- zipcode_data_table %>%
  querySelectorAll("div.rowHeaders div.pivotTableCellWrap") %>%
  map_chr(xml_text)

zipcode_data <- zipcode_data_table %>%
  querySelectorAll("div.bodyCells div.pivotTableCellWrap") %>%
  map(xml_parent) %>%
  unique() %>%
  map(~ .x %>% querySelectorAll("div.pivotTableCellWrap") %>% map_chr(xml_text)) %>%
  setNames(col_headers) %>%
  bind_cols()

# tadaa
df_final <- tibble(zipcode = rownames, zipcode_data) %>%
  type_convert(trim_ws = T, na = c(""))

The resulting df looks like this:

> df_final
# A tibble: 15 x 5
   zipcode `Confirmed Cases ` `% of Total Cases ` `Deaths ` `% of Total Deaths `
   <chr>                <dbl> <chr>                   <dbl> <chr>               
 1 63301                 1549 17.53%                     40 28.99%              
 2 63366                 1364 15.44%                     38 27.54%              
 3 63303                 1160 13.13%                     21 15.22%              
 4 63385                 1091 12.35%                     12 8.70%               
 5 63304                 1046 11.84%                      3 2.17%               
 6 63368                  896 10.14%                     12 8.70%               
 7 63367                  882 9.98%                       9 6.52%               
 8                        534 6.04%                       1 0.72%               
 9 63348                  105 1.19%                       0 0.00%               
10 63341                   84 0.95%                       1 0.72%               
11 63332                   64 0.72%                       0 0.00%               
12 63373                   25 0.28%                       1 0.72%               
13 63386                   17 0.19%                       0 0.00%               
14 63357                   13 0.15%                       0 0.00%               
15 63376                    5 0.06%                       0 0.00%

answered Sep 27 '22 23:09

alex_jwb90

Related questions
                            
                                vertical text in table cell using css3 writing-mode in Chrome
                            
                                Angular 8 forms, ngSubmit returns empty data object
                            
                                Random Position Choosing for Computer in Tic-Tac-Toe not Working How It's Supposed to
                            
                                Content shifts slightly on page reload with Chrome
                            
                                Sticky navigation bar stops working due to content underneath [duplicate]
                            
                                Uncaught TypeError: Cannot read property 'then' of undefined SweetAlert & jquery
                            
                                LocalStorage updating element causes duplication with JSON in JQuery
                            
                                How to pass authentication details to application inside iframe?
                            
                                How can I get source files from a compiled Electron application?
                            
                                How to align icon left and text centred in a html button?
                            
                                Make monospaced text as big as possible without causing overflow or wrapping
                            
                                How to test HTML content with React testing library
                            
                                How to set the Jinja environment variable in Flask?
                            
                                How to wrap text around a shape with border-radius?
                            
                                CSS Grid Stadium Shape
                            
                                Why is this SVG mask animation choppy in Firefox but smooth in Chrome?
                            
                                Connecting input range arc thumb to slider with curve
                            
                                Getting an error 'No files matching the pattern were found' when using prettier
                            
                                flex container ruining a border-radius, How to fix this?
                            
                                Flexbox grid with maximum 3 columns and minimum 2? With equal width items?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scrape website's Power BI dashboard using R

Tags:

html

r

web-scraping

Doughey

People also ask

1 Answers

alex_jwb90

Recent Activity

Donate For Us