I've written a script in Python to get only the name visible in my profile in SO. The thing is I would like to log in that site using requests module and once I'm logged in I wish to get the profile name using Selenium. The bottom line is -- when I get the profile URL, I would like that URL to be reused by Selenium to fetch the profile name.
This working solution using requests:
import requests
from bs4 import BeautifulSoup
url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
'fkey': fkey,
'ssrc': 'head',
'email': my_username,
'password': my_password,
'oauth_version':'',
'oauth_server':''
}
res = requests.post(url,data=payload)
soup = BeautifulSoup(res.text,"lxml")
item = soup.select_one("div[class^='gravatar-wrapper-']").get("title")
print(item)
What I wish to do now is:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
driver = webdriver.Chrome()
req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
'fkey': fkey,
'ssrc': 'head',
'email': my_username,
'password': my_password,
'oauth_version':'',
'oauth_server':''
}
res = requests.post(url,data=payload)
cookie_item = [{'name':name, 'value':value} for name,value in req.cookies.items()]
driver.add_cookie(cookie_item[0])
driver.get(res.url)
item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']").get_attribute("title")
print(item)
Upon execution I encounter the following error:
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unable to set cookie
How can I fetch the profile name using Selenium by re-using profile url derived from requests?
Extends Selenium WebDriver classes to include the request function from the Requests library, while doing all the needed cookie and request headers handling.
We can get the user Agent information with Selenium webdriver. This is done with the help of the JavaScript Executor. Selenium executes JavaScript commands with the help of the execute_script method. To obtain the user Agent information, we have to pass the return navigator.
The get command launches a new browser and opens the given URL in your Webdriver. It simply takes the string as your specified URL and opens it for testing purposes. If you are using Selenium IDE, it is similar to open command.
It's probably more appropriate to use the Stack Exchange API rather than scrape the site, but in any case..
There are a few problems:
You will sometimes get a captcha challenge.
Leaving the default requests
headers increases the odds of getting a captcha, so override it with one from a traditional browser.
You need to use requests.Session()
to maintain the cookies from both of the first two requests.
Before adding the cookies from the requests
session, you need to make an initial request with webdriver and clear any created cookies.
Taking those things into account, I was able to get it to work with the following:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"
headers = {
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36"
)
}
s = requests.Session()
req = s.get(url, headers=headers)
payload = {
"fkey": BeautifulSoup(req.text, "lxml").select_one("[name='fkey']")["value"],
"email": "YOUR_EMAIL",
"password": "YOUR_PASSWORD",
}
res = s.post(url, headers=headers, data=payload)
if "captcha" in res.url:
raise ValueError("Encountered captcha")
driver = webdriver.Chrome()
try:
driver.get(res.url)
driver.delete_all_cookies()
for cookie in s.cookies.items():
driver.add_cookie({"name": cookie[0], "value": cookie[1]})
driver.get(res.url)
item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']")
print(item.get_attribute("title"))
finally:
driver.quit()
You need to be on the domain that the cookie will be valid for.
Before calling driver.add_cookie()
, you must first navigate to [any] page from that domain... so, make an additional call to driver.get(url)
before attempting to add cookies. Even an error page will suffice:
driver.get('https://stackoverflow.com/404')
for example...
change this in your code:
driver.add_cookie(cookie_item[0])
driver.get(res.url)
to this:
driver.get('https://stackoverflow.com/404')
driver.add_cookie(cookie_item[0])
driver.get(res.url)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With