Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Selenium slow, or is my code wrong?

So I'm trying to login to Quora using Python and then scrape some stuff.

I'm using Selenium to login to the site. Here's my code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get('http://www.quora.com/')

username = driver.find_element_by_name('email')
password = driver.find_element_by_name('password')

username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)

driver.close()

Now the questions:

  1. It took ~4 minutes to find and fill the login form, which painfully slow. Is there something I can do to speed up the process?

  2. When it did login, how do I make sure there were no errors? In other words, how do I check the response code?

  3. How do I save cookies with selenium so I can continue scraping once I login?

  4. If there is no way to make selenium faster, is there any other alternative for logging in? (Quora doesn't have an API)

like image 379
KGo Avatar asked Jul 04 '13 05:07

KGo


People also ask

Why is Selenium so slow?

The Selenium WebDriver scripts are very slow because they interact with a site through the browser.

Is Selenium slower than requests?

Using Requests generally results in faster and more concise code, while using Selenium makes development faster on Javascript heavy sites.

Is there anything faster than Selenium?

PyQt was a bit more cumbersome than selenium, but significantly faster and does some things much better. You can find tutorials for a basic scraping script using it with a quick google search.


4 Answers

I had a similar problem with very slow find_elements_xxx calls in Python selenium using the ChromeDriver. I eventually tracked down the trouble to a driver.implicitly_wait() call I made prior to my find_element_xxx() calls; when I took it out, my find_element_xxx() calls ran quickly.

Now, I know those elements were there when I did the find_elements_xxx() calls. So I cannot imagine why the implicit_wait should have affected the speed of those operations, but it did.

like image 161
Polly Avatar answered Oct 15 '22 22:10

Polly


  1. I have been there, selenium is slow. It may not be as slow as 4 min to fill a form. I then started using phantomjs, which is much faster than firefox, since it is headless. You can simply replace Firefox() with PhantomJS() in the webdriver line after installing latest phantomjs.

  2. To check that you have login you can assert for some element which is displayed after login.

  3. As long as you do not quit your driver, cookies will be available to follow links

  4. You can try using urllib and post directly to the login link. You can use cookiejar to save cookies. You can even simply save cookie, after all, a cookie is simply a string in http header

like image 33
manish Avatar answered Oct 15 '22 22:10

manish


You can fasten your form filling by using your own setAttribute method, here is code for java for it

public void setAttribute(By locator, String attribute, String value) {
    ((JavascriptExecutor) getDriver()).executeScript("arguments[0].setAttribute('" + attribute
            + "',arguments[1]);",
            getElement(locator),
            value);
}
like image 25
Stormy Avatar answered Oct 15 '22 23:10

Stormy


Running the web driver headlessly should improve its execution speed to some degree.

from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument('-headless')
browser = webdriver.Firefox(firefox_options=options)

browser.get('https://google.com/')
browser.close()
like image 29
oldboy Avatar answered Oct 15 '22 22:10

oldboy