Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can't fetch the profile name using Selenium after logging in using requests

I've written a script in Python to get only the name visible in my profile in SO. The thing is I would like to log in that site using requests module and once I'm logged in I wish to get the profile name using Selenium. The bottom line is -- when I get the profile URL, I would like that URL to be reused by Selenium to fetch the profile name.

This working solution using requests:

import requests
from bs4 import BeautifulSoup

url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"

req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
    'fkey': fkey,
    'ssrc': 'head',
    'email': my_username,
    'password': my_password,
    'oauth_version':'', 
    'oauth_server':'' 
    }
res = requests.post(url,data=payload)
soup = BeautifulSoup(res.text,"lxml")
item = soup.select_one("div[class^='gravatar-wrapper-']").get("title")
print(item)

What I wish to do now is:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"

driver = webdriver.Chrome()

req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
    'fkey': fkey,
    'ssrc': 'head',
    'email': my_username,
    'password': my_password,
    'oauth_version':'', 
    'oauth_server':'' 
    }
res = requests.post(url,data=payload)
cookie_item = [{'name':name, 'value':value} for name,value in req.cookies.items()]
driver.add_cookie(cookie_item[0])
driver.get(res.url)
item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']").get_attribute("title")
print(item)

Upon execution I encounter the following error:

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unable to set cookie

How can I fetch the profile name using Selenium by re-using profile url derived from requests?

like image 270
MITHU Avatar asked Feb 06 '19 14:02

MITHU


People also ask

Can you use Selenium and requests together?

Extends Selenium WebDriver classes to include the request function from the Requests library, while doing all the needed cookie and request headers handling.

How do I find user Agent in Selenium?

We can get the user Agent information with Selenium webdriver. This is done with the help of the JavaScript Executor. Selenium executes JavaScript commands with the help of the execute_script method. To obtain the user Agent information, we have to pass the return navigator.

What is the get () method used for in Selenium?

The get command launches a new browser and opens the given URL in your Webdriver. It simply takes the string as your specified URL and opens it for testing purposes. If you are using Selenium IDE, it is similar to open command.


2 Answers

It's probably more appropriate to use the Stack Exchange API rather than scrape the site, but in any case..

There are a few problems:

  1. You will sometimes get a captcha challenge.

  2. Leaving the default requests headers increases the odds of getting a captcha, so override it with one from a traditional browser.

  3. You need to use requests.Session() to maintain the cookies from both of the first two requests.

  4. Before adding the cookies from the requests session, you need to make an initial request with webdriver and clear any created cookies.

Taking those things into account, I was able to get it to work with the following:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"

headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36"
    )
}

s = requests.Session()

req = s.get(url, headers=headers)
payload = {
    "fkey": BeautifulSoup(req.text, "lxml").select_one("[name='fkey']")["value"],
    "email": "YOUR_EMAIL",
    "password": "YOUR_PASSWORD",
}

res = s.post(url, headers=headers, data=payload)

if "captcha" in res.url:
    raise ValueError("Encountered captcha")

driver = webdriver.Chrome()

try:
    driver.get(res.url)
    driver.delete_all_cookies()

    for cookie in s.cookies.items():
        driver.add_cookie({"name": cookie[0], "value": cookie[1]})

    driver.get(res.url)

    item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']")
    print(item.get_attribute("title"))
finally:
    driver.quit()
like image 146
cody Avatar answered Sep 28 '22 04:09

cody


You need to be on the domain that the cookie will be valid for.

Before calling driver.add_cookie(), you must first navigate to [any] page from that domain... so, make an additional call to driver.get(url) before attempting to add cookies. Even an error page will suffice:

driver.get('https://stackoverflow.com/404')

for example...

change this in your code:

driver.add_cookie(cookie_item[0])
driver.get(res.url)

to this:

driver.get('https://stackoverflow.com/404')
driver.add_cookie(cookie_item[0])
driver.get(res.url)
like image 25
Corey Goldberg Avatar answered Sep 28 '22 05:09

Corey Goldberg