I have problems comparing two libraries in Python 3.6. I use Selenium Firefox WebDriver to log into a website, but when I want BeautifulSoup or Requests to read that website, it reads the link, but differently (reads that page as if I have not logged in). How can I tell Requests that I have already logged in?
Below is the code I have written so far ---
from selenium import webdriver
import config
import requests
from bs4 import BeautifulSoup
#choose webdriver
browser=webdriver.Firefox(executable_path="C:\\Users\\myUser\\geckodriver.exe")
browser.get("https://www.mylink.com/")
#log in
timeout = 1
login = browser.find_element_by_name("sf-login")
login.send_keys(config.USERNAME)
password = browser.find_element_by_name("sf-password")
password.send_keys(config.PASSWORD)
button_log = browser.find_element_by_xpath("/html/body/div[2]/div[1]/div/section/div/div[2]/form/p[2]/input")
button_log.click()
name = "https://www.policytracker.com/auctions/page/"
browser.get(name)
name2 = "/html/body/div[2]/div[1]/div/section/div/div[2]/div[3]/div[" + str(N) + "]/a"
#next page loaded
title1 = browser.find_element_by_xpath(name2)
title1.click()
page = browser.current_url -------> this save url from website that i want to download content (i've already logged in that page)
r = requests.get(page) ---------> i want requests to go to this page, he goes, but not included logged in proceder.... WRONG
r.content
soup = BeautifulSoup(r.content, 'lxml')
print (soup)
If you simply want to pass the page source to BeautifulSoup, you can get the page source from selenium and then pass it to BeautifulSoup directly (no need of requests module).
Instead of
page = browser.current_url
r = requests.get(page)
soup = BeautifulSoup(r.content, 'lxml')
you can do
page = browser.page_source
soup = BeautifulSoup(page, 'html.parser')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With