Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrolling web page using selenium python webdriver

I am scraping this webpage for usernames which loads the users after scrolling

Url to page : "http://www.quora.com/Kevin-Rose/followers"

I know the number of users on the page (in this case no. is 43812) How can I scroll the page till all the users are loaded? I have searched for the same on the internet and everywhere I got almost same line of code for doing it which is:

driver.execute_script("window.scrollTo(0, )")

How can I determine the vertical position to ensure that all the users are loaded? Is there any other option to achieve the same thing without actually scrolling?

   from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import urllib

driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
time.sleep(10)

wait = WebDriverWait(driver, 10)

form = driver.find_element_by_class_name('regular_login')
time.sleep(10)
#add explicit wait

username = form.find_element_by_name('email')
time.sleep(10)
#add explicit wait

username.send_keys('[email protected]')
time.sleep(30)
#add explicit wait

password = form.find_element_by_name('password')
time.sleep(30)
#add explicit wait

password.send_keys('def')
#add explicit wait

password.send_keys(Keys.RETURN)
time.sleep(30)

#search = driver.find_element_by_name('search_input')
search = wait.until(EC.presence_of_element_located((By.XPATH, "//form[@name='search_form']//input[@name='search_input']")))

search.clear()
search.send_keys('Kevin Rose')
search.send_keys(Keys.RETURN)

link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Kevin Rose")))
link.click()
#Wait till the element is loaded (Asynchronusly loaded webpage)

handle = driver.window_handles
driver.switch_to.window(handle[1])
#switch to new window 

element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Followers")))
element.click()
like image 678
Siddhesh Avatar asked Sep 30 '22 20:09

Siddhesh


1 Answers

Since there is nothing special appearing after the last followers bucket is loaded, I would rely on the fact that you know how many followers does the user have and you know how many are loaded on each scroll down (I've inspected - it is 18 per scroll). Hence, you can calculate how many times do you need to scroll the page down.

Here's the implementation (I've used a different user with only 53 followers to demonstrate the solution):

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

followers_per_page = 18

driver = webdriver.Chrome()  # webdriver.Firefox() in your case
driver.get("http://www.quora.com/Andrew-Delikat/followers")

# get the followers count
element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]')))
followers_count = int(element.text.replace(',', ''))
print followers_count

# scroll down the page iteratively with a delay
for _ in xrange(0, followers_count/followers_per_page + 1):
    driver.execute_script("window.scrollTo(0, 10000);")
    time.sleep(2)

Also, you may need to increase this 10000 Y coordinate value based on the loop variable in case there is a big number of followers.

like image 138
alecxe Avatar answered Oct 13 '22 00:10

alecxe