I'm working on scraping a private Tableau Dashboard from a vendor and cannot seem to select or use the embedded scrollbars that exist in tableau. I've attempted to scroll, scroll into view, and simply grabbing the scrollbar with javascript.
An example of the scrollbar I've encountered can be found at:
https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y
the XPATH I am using is
/html/body/div[2]/div[3]/div[1]/div[1]/div/div[2]/div[4]/div/div/div/div/div[2]/div/div/div/div[1]/div[20]
I've attempted the options found here, here, and here.
I cannot seem to actually grab the scrollbar itself. The best I've been able to do is click the entire bar.
How can I advance this scrollbar to bring IDs into view as I iterate over them?
import os, sys, shutil, logging, os.path
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
from selenium.webdriver.chrome.options import Options
from azure.storage.blob import BlockBlobService
url = 'https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y'
PATH = "/Users/171644/python_tools/chromedriver" #change this
options = Options()
driver = webdriver.Chrome(PATH,options=options)
wait = WebDriverWait(driver, 120)
driver.get(url)
time.sleep(5)
driver.fullscreen_window()
time.sleep(10)
element = driver.find_element_by_id('10671917940_0')
actions = ActionChains(driver)
actions.move_to_element(element).perform()
This is not going to work because the element you are trying to access is located inside of an iframe from a different domain. You can read more on this on Same-Origin-Policy .
Additionally, there are many reasons why your approach will take a lot of time and be flaky here: Embedded tableau workbooks are rendered inside an iframe (you will have to locate each invididual iframe) and there's also asynchronous rendering taking place w/ AJAX calls; so you will deal with explicit waits a lot.
I would advise to use a scraping tool instead
I leave you a little code snippet in case you want to follow up on the latest.
from tableauscraper import TableauScraper as TS
url = "https://public.tableau.com/views/WorldIndicators-TableauGeneralExample/Story?%3Aembed=y&%3AshowVizHome=no&%3AshowTabs=y&%3Adisplay_count=y&%3Adisplay_static_image=y"
ts = TS()
ts.loads(url)
workbook = ts.getWorkbook()
for t in workbook.worksheets:
print(f"worksheet name : {t.name}") #show worksheet name
print(t.data) #show dataframe for this worksheet
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With