I have problems comparing two libraries in Python 3.6. I use Selenium Firefox WebDriver to log into a website, but when I want BeautifulSoup or Requests to read that website, it reads the link, but differently (reads that page as if I have not logged in). How can I tell Requests that I have already logged in?
Below is the code I have written so far ---
from selenium import webdriver
import config
import requests
from bs4 import BeautifulSoup
#choose webdriver
browser=webdriver.Firefox(executable_path="C:\\Users\\myUser\\geckodriver.exe")
browser.get("https://www.mylink.com/")
#log in
timeout = 1
login = browser.find_element_by_name("sf-login")
login.send_keys(config.USERNAME)
password = browser.find_element_by_name("sf-password")
password.send_keys(config.PASSWORD)
button_log = browser.find_element_by_xpath("/html/body/div[2]/div[1]/div/section/div/div[2]/form/p[2]/input")
button_log.click()
name = "https://www.policytracker.com/auctions/page/"
browser.get(name)
name2 = "/html/body/div[2]/div[1]/div/section/div/div[2]/div[3]/div[" + str(N) + "]/a"
#next page loaded
title1 = browser.find_element_by_xpath(name2)
title1.click()
page = browser.current_url -------> this save url from website that i want to download content (i've already logged in that page)
r = requests.get(page) ---------> i want requests to go to this page, he goes, but not included logged in proceder.... WRONG
r.content
soup = BeautifulSoup(r.content, 'lxml')
print (soup)
If you simply want to pass the page source to BeautifulSoup
, you can get the page source from selenium
and then pass it to BeautifulSoup
directly (no need of requests
module).
Instead of
page = browser.current_url
r = requests.get(page)
soup = BeautifulSoup(r.content, 'lxml')
you can do
page = browser.page_source
soup = BeautifulSoup(page, 'html.parser')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With