Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping a PowerBI dashboard report: expanding rows and loading/saving values in a scrolling table

Objective:

I'm trying to scrape a PowerBI report for "Reported deliveries >> By number of vaccines" in the the "Delivery" page on this public PowerBI dashboard using Selenium to get table listing all countries, with the name of the vaccines delivered, and the number of doses delivered.

Target and setup:

Selecting the view, expanding the rows

This screencap shows the PowerBI report that I'm trying to scrape. After clicking on "Delivery" on the landing page and selecting "Reported deliveries >> By number of vaccines", there is a table that loads up on the bottom half of the page as you can see in this image. You can right-click anywhere on the table to expand all the rows, which are country names. Beneath each country name, is the name of the vaccine delivered in one column, and the "Doses delivered" in the adjacent column. There are 100+ rows with country names, and these load dynamically as you scroll down through this table.

Progress so far:

I've managed to start a Chrome headless browser, get to the "Delivery" page by clicking on the relevant button, and then open the correct report that I want by clicking on the relevant tick box in the "View option" menu on the left. After this I'm stuck. I can't figure out how to expand all of the rows in this dynamically rendered table, and I also can't figure out how to scroll down through the table and save the country names, along with their respective vaccine names and number of doses delivered.

With regards to expanding all of the Country name rows, I've seen a few posts here in StackOverflow suggesting the use of Selenium's Context Click in the Actions class, but I can't seem to figure out how to apply it to my example.

With regards to the scrolling problem, I'm at a bit of a loss here as well. I tried, in vain, to find the underlying API where data was being called for the table, but my very novice eyes were not able to spot anything, and I'm wondering how else to scroll all the way down the table to scrape this data.

Any help, hints, or suggestions would be greatly appreciated!

Code up to now

from selenium.webdriver import Chrome
from selenium.webdriver import ChromeOptions

webdriver = "C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe"
options = ChromeOptions()
driver = Chrome(webdriver, options = options)

url = 'https://app.powerbi.com/view?r=eyJrIjoiNmE0YjZiNzUtZjk2OS00ZTg4LThlMzMtNTRhNzE0NzA4YmZlIiwidCI6Ijc3NDEwMTk1LTE0ZTEtNGZiOC05MDRiLWFiMTg5MjAyMzY2NyIsImMiOjh9&pageName=ReportSectiona329b3eafd86059a947b'

driver.get(url)
  
#click on "Delivery" button
driver.find_element_by_xpath('//*[@id="pvExplorationHost"]/div/div/exploration/div/explore-canvas-modern/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container-modern[11]/transform').click()

#click on "Reported deliveries >> By number of vaccines" click box
driver.find_element_by_xpath('//*[@id="pvExplorationHost"]/div/div/exploration/div/explore-canvas-modern/div/div[2]/div/div[2]/div[2]/visual-container-repeat/visual-container-modern[8]/transform/div/div[3]/div/visual-modern/div/button').click()
like image 973
anguyen1210 Avatar asked Nov 06 '22 01:11

anguyen1210


1 Answers

Update: I found a way to scrape all the data from the table. What I found works, is to scrape 11 elements at a time, locate the scrollbar element and click just below it. The code is a little rough. Right now I am just iterating 70 times through the grab 11 elements => click on scrollbar workflow, which gets you.... most of the way through the table. I used the following code:

from selenium.webdriver import Chrome
from selenium.webdriver import ChromeOptions
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains

webdriver = "C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe"
options = ChromeOptions()
driver = Chrome(webdriver, options = options)

def wait_for_element(selector_type, selector):
    return WebDriverWait(driver, 20).until(EC.element_to_be_clickable((selector_type, selector)))

url = 'https://app.powerbi.com/view?r=eyJrIjoiNmE0YjZiNzUtZjk2OS00ZTg4LThlMzMtNTRhNzE0NzA4YmZlIiwidCI6Ijc3NDEwMTk1LTE0ZTEtNGZiOC05MDRiLWFiMTg5MjAyMzY2NyIsImMiOjh9&pageName=ReportSectiona329b3eafd86059a947b'

driver.get(url)
wait_for_element(By.XPATH, '//visual-container-modern[11]//button').click()
wait_for_element(By.XPATH, '//visual-container-modern[8]//button').click()
table_element = wait_for_element(By.CSS_SELECTOR, 'div.expandableContent.pivotTableCellWrap')
action = ActionChains(driver)
action.context_click(table_element)
action.perform()
wait_for_element(By.CSS_SELECTOR, 'div[title="Expand"]').click()
wait_for_element(By.CSS_SELECTOR, 'div[title="All"]').click()
time.sleep(1)
for x in range(70):
    first_column = driver.find_elements_by_css_selector('div.expandableContent.pivotTableCellWrap')
    second_column = driver.find_elements_by_css_selector('div.pivotTableCellWrap.cell-interactive.tablixAlignRight')
    for i in range(11):
        print(first_column[i].text + ': ' + second_column[i].text)
    actions = ActionChains(driver)
    actions.move_to_element_with_offset(driver.find_elements_by_css_selector('div.scroll-bar-part-bar')[1], 0, 55)
    actions.click()
    actions.perform()

and the output is like this:

China: 275,338,000.00
 : 275,338,000.00
USA: 264,505,725.00
Pfizer/BioNTech: 134,063,475.00
Moderna: 112,801,100.00
Janssen: 17,641,150.00
India: 154,211,511.00
 : 154,211,511.00
Indonesia: 60,000,000.00
Sinovac: 60,000,000.00
Brazil: 44,100,000.00
Sinovac: 275,338,000.00
AstraZeneca: 275,338,000.00
Germany: 264,505,725.00
Pfizer/BioNTech: 134,063,475.00
AstraZeneca: 112,801,100.00
Moderna: 17,641,150.00
Spain: 154,211,511.00
Pfizer/BioNTech: 154,211,511.00
AstraZeneca: 60,000,000.00
Moderna: 60,000,000.00
Janssen: 44,100,000.00
Italy: 6,563,100.00
Pfizer/BioNTech: 146,400.00
AstraZeneca: 23,705,865.00
Moderna: 16,352,485.00
Janssen: 5,277,380.00
France: 1,896,200.00
Pfizer/BioNTech: 179,800.00
AstraZeneca: 23,351,295.00
Moderna: 15,253,095.00
Janssen: 5,407,600.00
Turkey: 2,484,200.00
 : 6,563,100.00
Mexico: 146,400.00
AstraZeneca: 23,705,865.00
Pfizer/BioNTech: 16,352,485.00
CanSino Biologicals: 5,277,380.00
Sinovac: 1,896,200.00
Gamaleya Research Institute: 179,800.00
Russia: 23,351,295.00
 : 15,253,095.00
Chile: 5,407,600.00
 : 2,484,200.00
 : 20,070,743.00
Poland: 20,070,743.00
Pfizer/BioNTech: 14,848,435.00
AstraZeneca: 14,848,435.00
Moderna: 13,825,476.00
Janssen: 13,825,476.00
UAE: 11,879,200.00
 : 7,815,600.00
Israel: 2,695,200.00
 : 1,250,800.00
Saudi Arabia: 117,600.00
 : 20,070,743.00
Morocco: 20,070,743.00
 : 14,848,435.00
Bangladesh: 14,848,435.00
AstraZeneca: 13,825,476.00
Argentina: 13,825,476.00
Gamaleya Research Institute: 11,879,200.00
Beijing Institute of Biological Products (CNBG): 7,815,600.00
AstraZeneca: 2,695,200.00
Netherlands: 1,250,800.00
Pfizer/BioNTech: 117,600.00
Pfizer/BioNTech: 9,100,000.00
AstraZeneca: 8,000,245.00
Moderna: 5,420,245.00
Janssen: 2,000,000.00
Romania: 580,000.00
Pfizer/BioNTech: 7,112,110.00
AstraZeneca: 4,463,550.00
Moderna: 1,852,960.00
Hungary: 716,400.00
 : 79,200.00
Colombia: 6,365,530.00
 : 9,100,000.00
Portugal: 8,000,245.00
Pfizer/BioNTech: 5,420,245.00
AstraZeneca: 2,000,000.00
Moderna: 580,000.00
Philippines: 7,112,110.00
Sinovac: 4,463,550.00
Belgium: 1,852,960.00
 : 716,400.00
South Korea: 79,200.00
 : 6,365,530.00
Pfizer/BioNTech: 945,600.00
AstraZeneca: 339,600.00
Moderna: 4,000,000.00
Serbia: 4,000,000.00
 : 3,902,903.00
Japan: 3,902,903.00
 : 3,633,353.00
Austria: 3,633,353.00
Pfizer/BioNTech: 3,632,335.00
AstraZeneca: 2,432,235.00
Moderna: 878,500.00
Janssen: 16,800.00
Czech Rep.: 3,201,704.00
 : 3,201,704.00
Greece: 3,115,339.00
 : 3,115,339.00
Switzerland: 2,629,843.00
 : 2,629,843.00
Cambodia: 2,411,279.00
 : 2,411,279.00
Australia: 2,254,074.00
 : 2,254,074.00
Singapore: 16,800.00
 : 3,201,704.00
Thailand: 3,201,704.00
Sinovac: 3,115,339.00
AstraZeneca: 3,115,339.00
Nepal: 2,629,843.00
AstraZeneca: 2,629,843.00
Pakistan: 2,411,279.00
 : 2,411,279.00
Denmark: 2,254,074.00
 : 2,254,074.00
Hong Kong SAR, China: 2,007,041.00
Sinovac: 2,007,041.00
Dominican Rep.: 2,000,000.00
 : 2,000,000.00
Ireland: 1,982,185.00
Pfizer/BioNTech: 1,982,185.00
AstraZeneca: 1,926,330.00
Moderna: 1,203,930.00
Janssen: 564,000.00
Slovakia: 144,000.00
Pfizer/BioNTech: 14,400.00
Gamaleya Research Institute: 2,007,041.00
Moderna: 2,007,041.00
Finland: 2,000,000.00
Pfizer/BioNTech: 2,000,000.00
AstraZeneca: 1,982,185.00
Moderna: 1,982,185.00
Uruguay: 1,926,330.00
 : 1,203,930.00
Norway: 564,000.00
 : 144,000.00
Peru: 14,400.00
 : 1,834,409.00
Lithuania: 1,834,409.00
Pfizer/BioNTech: 1,700,785.00
AstraZeneca: 1,700,785.00
Moderna: 1,663,307.00
Qatar: 1,663,307.00
 : 1,633,275.00
Kazakhstan: 1,204,005.00
 : 319,252.00
Azerbaijan: 110,018.00
 : 1,603,913.00
Hong Kong: 1,834,409.00
 : 1,834,409.00
Bulgaria: 1,700,785.00
AstraZeneca: 1,700,785.00
Pfizer/BioNTech: 1,663,307.00
Moderna: 1,663,307.00
Janssen: 1,633,275.00
Malaysia: 1,204,005.00
 : 319,252.00
Zimbabwe: 110,018.00
Sinovac: 1,603,913.00
Beijing Institute of Biological Products (CNBG): 629,420.00
Mongolia: 169,200.00
 : 14,400.00
Egypt: 1,453,382.00
Beijing Institute of Biological Products (CNBG): 1,453,382.00
AstraZeneca: 1,400,000.00
Bahrain: 1,000,000.00
 : 400,000.00
Nigeria: 1,398,592.00
 : 1,398,592.00
Jordan: 1,350,000.00
Jordan: 629,420.00
Pfizer/BioNTech: 169,200.00
Beijing Institute of Biological Products (CNBG): 14,400.00
South Africa: 1,453,382.00
AstraZeneca: 1,453,382.00
Janssen: 1,400,000.00
Iran: 1,000,000.00
 : 400,000.00
Myanmar: 1,398,592.00
 : 1,398,592.00
Ecuador: 1,350,000.00
Ecuador: 1,080,000.00
 : 1,000,000.00
Sri Lanka: 80,000.00
 : 1,068,123.00
Kuwait: 1,068,123.00
 : 1,040,000.00
UK: 1,040,000.00
Pfizer/BioNTech: 1,034,338.00
Uzbekistan: 1,034,338.00
Anhui Zhifei Longcom Biopharmaceutical: 1,016,195.00
Costa Rica: 1,016,195.00
Pfizer/BioNTech: 1,080,000.00
Croatia: 1,000,000.00
Pfizer/BioNTech: 80,000.00
AstraZeneca: 1,068,123.00
Moderna: 1,068,123.00
Janssen: 1,040,000.00
El Salvador: 1,040,000.00
 : 1,034,338.00
Kenya: 1,034,338.00
 : 1,016,195.00
Ghana: 1,016,195.00
 : 457,470.00
Bolivia: 319,200.00
 : 186,128.00
Ukraine: 7,150.00
AstraZeneca: 915,725.00
Sinovac: 915,725.00
Pfizer/BioNTech: 853,081.00
Venezuela: 853,081.00
Beijing Institute of Biological Products (CNBG): 849,527.00
Gamaleya Research Institute: 849,527.00
Panama: 848,149.00
Panama: 457,470.00
 : 319,200.00
Slovenia: 186,128.00
Pfizer/BioNTech: 7,150.00
AstraZeneca: 915,725.00
Moderna: 915,725.00
Estonia: 853,081.00
Pfizer/BioNTech: 853,081.00
AstraZeneca: 849,527.00
Moderna: 849,527.00
Janssen: 848,149.00
 : 687,653.00
Algeria: 671,100.00
Gamaleya Research Institute: 409,500.00
Albania: 184,800.00

... and so on.

like image 188
C. Peck Avatar answered Nov 15 '22 10:11

C. Peck