I've written a script in Python to get only the name visible in my profile in SO. The thing is I would like to log in that site using requests module and once I'm logged in I wish to get the profile name using Selenium. The bottom line is -- when I get the profile URL, I would like that URL to be reused by Selenium to fetch the profile name. This working solution using requests: <pre class="prettyprint"><code>import requests from bs4 import BeautifulSoup url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f" req = requests.get(url) sauce = BeautifulSoup(req.text,"lxml") fkey = sauce.select_one("[name='fkey']")['value'] payload = { 'fkey': fkey, 'ssrc': 'head', 'email': my_username, 'password': my_password, 'oauth_version':'', 'oauth_server':'' } res = requests.post(url,data=payload) soup = BeautifulSoup(res.text,"lxml") item = soup.select_one("div[class^='gravatar-wrapper-']").get("title") print(item) </code></pre> What I wish to do now is: <pre class="prettyprint"><code>import requests from bs4 import BeautifulSoup from selenium import webdriver url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f" driver = webdriver.Chrome() req = requests.get(url) sauce = BeautifulSoup(req.text,"lxml") fkey = sauce.select_one("[name='fkey']")['value'] payload = { 'fkey': fkey, 'ssrc': 'head', 'email': my_username, 'password': my_password, 'oauth_version':'', 'oauth_server':'' } res = requests.post(url,data=payload) cookie_item = [{'name':name, 'value':value} for name,value in req.cookies.items()] driver.add_cookie(cookie_item[0]) driver.get(res.url) item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']").get_attribute("title") print(item) </code></pre> Upon execution I encounter the following error: <pre class="prettyprint"><code>raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: unable to set cookie </code></pre> How can I fetch the profile name using Selenium by re-using profile url derived from requests?

You need to be on the domain that the cookie will be valid for. Before calling <code>driver.add_cookie()</code>, you must first navigate to [any] page from that domain... so, make an additional call to <code>driver.get(url)</code> before attempting to add cookies. Even an error page will suffice: <code>driver.get('https://stackoverflow.com/404')</code> for example... change this in your code: <pre class="prettyprint"><code>driver.add_cookie(cookie_item[0]) driver.get(res.url) </code></pre> to this: <pre class="prettyprint"><code>driver.get('https://stackoverflow.com/404') driver.add_cookie(cookie_item[0]) driver.get(res.url) </code></pre>

Can't fetch the profile name using Selenium after logging in using requests

Tags:

python

python-3.x

python-requests

selenium

web-scraping

I've written a script in Python to get only the name visible in my profile in SO. The thing is I would like to log in that site using requests module and once I'm logged in I wish to get the profile name using Selenium. The bottom line is -- when I get the profile URL, I would like that URL to be reused by Selenium to fetch the profile name.

This working solution using requests:

import requests
from bs4 import BeautifulSoup

url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"

req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
    'fkey': fkey,
    'ssrc': 'head',
    'email': my_username,
    'password': my_password,
    'oauth_version':'', 
    'oauth_server':'' 
    }
res = requests.post(url,data=payload)
soup = BeautifulSoup(res.text,"lxml")
item = soup.select_one("div[class^='gravatar-wrapper-']").get("title")
print(item)

What I wish to do now is:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"

driver = webdriver.Chrome()

req = requests.get(url)
sauce = BeautifulSoup(req.text,"lxml")
fkey = sauce.select_one("[name='fkey']")['value']
payload = {
    'fkey': fkey,
    'ssrc': 'head',
    'email': my_username,
    'password': my_password,
    'oauth_version':'', 
    'oauth_server':'' 
    }
res = requests.post(url,data=payload)
cookie_item = [{'name':name, 'value':value} for name,value in req.cookies.items()]
driver.add_cookie(cookie_item[0])
driver.get(res.url)
item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']").get_attribute("title")
print(item)

Upon execution I encounter the following error:

raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unable to set cookie

How can I fetch the profile name using Selenium by re-using profile url derived from requests?

270

asked Feb 06 '19 14:02

MITHU

2 Answers

It's probably more appropriate to use the Stack Exchange API rather than scrape the site, but in any case..

There are a few problems:

You will sometimes get a captcha challenge.
Leaving the default requests headers increases the odds of getting a captcha, so override it with one from a traditional browser.
You need to use requests.Session() to maintain the cookies from both of the first two requests.
Before adding the cookies from the requests session, you need to make an initial request with webdriver and clear any created cookies.

Taking those things into account, I was able to get it to work with the following:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

url = "https://stackoverflow.com/users/login?ssrc=head&returnurl=https%3a%2f%2fstackoverflow.com%2f"

headers = {
    "User-Agent": (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.81 Safari/537.36"
    )
}

s = requests.Session()

req = s.get(url, headers=headers)
payload = {
    "fkey": BeautifulSoup(req.text, "lxml").select_one("[name='fkey']")["value"],
    "email": "YOUR_EMAIL",
    "password": "YOUR_PASSWORD",
}

res = s.post(url, headers=headers, data=payload)

if "captcha" in res.url:
    raise ValueError("Encountered captcha")

driver = webdriver.Chrome()

try:
    driver.get(res.url)
    driver.delete_all_cookies()

    for cookie in s.cookies.items():
        driver.add_cookie({"name": cookie[0], "value": cookie[1]})

    driver.get(res.url)

    item = driver.find_element_by_css_selector("div[class^='gravatar-wrapper-']")
    print(item.get_attribute("title"))
finally:
    driver.quit()

146

answered Sep 28 '22 04:09

cody

You need to be on the domain that the cookie will be valid for.

Before calling driver.add_cookie(), you must first navigate to [any] page from that domain... so, make an additional call to driver.get(url) before attempting to add cookies. Even an error page will suffice:

driver.get('https://stackoverflow.com/404')

for example...

change this in your code:

driver.add_cookie(cookie_item[0])
driver.get(res.url)

to this:

driver.get('https://stackoverflow.com/404')
driver.add_cookie(cookie_item[0])
driver.get(res.url)

answered Sep 28 '22 05:09

Corey Goldberg

Related questions
                            
                                ImportError: cannot import name check_array from sklearn.utils.validation
                            
                                Create UUID on client and save primary key with Django REST Framework and using a POST
                            
                                Django not sending error emails - how can I debug?
                            
                                Logging in Django on Heroku not appearing
                            
                                Distribution of Number of Digits of Random Numbers
                            
                                How can I add labels to TensorBoard Images?
                            
                                how can I asynchronously map/filter an asynchronous iterable?
                            
                                using mattermost api via gitlab oauth as an end-user with username and password (no client_secret)
                            
                                How do I make a custom model Field call to_python when the field is accessed immediately after initialization (not loaded from DB) in Django >=1.10?
                            
                                Get weight matrices from gensim word2Vec
                            
                                Why does __self__ of built-in functions return the builtin module it belongs to?
                            
                                What Does the python -v Command Do
                            
                                Unit tests fail after a Django upgrade
                            
                                When to use multiple event loops?
                            
                                How to get interactive plot of pyplot when using pycharm
                            
                                cProfile adds significant overhead when calling numba jit functions
                            
                                What is the Big O Complexity of Reversing the Order of Columns in Pandas DataFrame?
                            
                                Pandas DataFrame to multidimensional NumPy Array
                            
                                How annotate a function that takes another function as parameter?
                            
                                Dynamic communication between main and subprocess in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With