Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Manipulate and read browser from current browser

I am struggling to find a method in python which allows you to read data in a currently used web browser. Effectively, I am trying to download a massive dataframe of data on a locally controlled company webpage and implement it into a dataframe. The issue is that the website has a fairly complex authentication token process which I have not been able to bypass using Selenium using a slew of webdrivers, Requests, urllib, and cookielib using a variety of user parameters. I have given up on this front entirely as I am almost positive that there is more to the authentication process than can be achieved easily with these libraries.

However, I did manage to bypass the required tokenization process when I quickly tested opening a new tab in a current browser which was already logged in using WebBrowser. Classically, WebBrowser does not offer a read function meaning that even though the page can be opened the data on the page cannot be read into a pandas dataframe. This got me thinking I could use Win32com, open a browser, login, then run the rest of the script, but again, there is no general read ability of the dispatch for internet explorer meaning I can't send the information I want to pandas. I'm stumped. Any ideas?

I could acquire the necessary authentication token scripts, but I am sure that it would take a week or two before anything would happen on that front. I would obviously prefer to get something in the mean time while I wait for the actual auth scripts from the company.

Update: I received authentication tokens from the company, however it requires using a python package on another server I do not have access too, mostly because its an oddity that I am using Python in my department. Thus the above still applies - need a method for reading and manipulating an open browser.

like image 901
WolVes Avatar asked Oct 10 '17 18:10

WolVes


People also ask

Can Python control a Web browser?

In this article, we are going to see how to control the web browser with Python using selenium. Selenium is an open-source tool that automates web browsers. It provides a single interface that lets you write test scripts in programming languages like Ruby, Java, NodeJS, PHP, Perl, Python, and C#, etc.

How do I open a specific browser in Python?

To open a page in a specific browser, use the webbrowser. get() function to specify a particular browser.


1 Answers

Step-by-step

1) Start browser with Selenium.

2) Script should start waiting for certain element that inform you that you got required page and logged in.

3) You can use this new browser window to login to page manually.

4) Script detects that you are on required page and logged in.

5) Script processes page the way you like.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# start webdriver (opens Chrome in new window)
chrome = webdriver.Chrome()

# initialize waiter with maximum 300 seconds to wait.
waiter = WebDriverWait(chrome , 300)

# Will wait for appear of #logout element.
# I assume it shows that you are logged in.
wait.until(EC.presence_of_element_located(By.ID, "logout"))

# Extract data etc.

It might be easier if you use your Chrome user's profile. This way you may have previous session continued so you will not need to do any login actions.

options = webdriver.ChromeOptions() 
options.add_argument("user-data-dir=FULL_PATH__TO_PROFILE")
chrome = webdriver.Chrome(chrome_options=options)
chrome.get("https://your_page_here")
like image 188
Alexander C Avatar answered Nov 15 '22 18:11

Alexander C