Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

invoking onclick event with beautifulsoup python

I am trying to fetch the links to all accomodations in Cyprus from this website: http://www.zoover.nl/cyprus

So far I can retrieve the first 15 which are already shown. So now I have to invoke the click on the "volgende"-link. However I don't know how to do that and in the source code I am not able to track down the function called to use e.g. sth like posted here: Issues with invoking "on click event" on the html page using beautiful soup in Python

I only need the step where the "clicking" happens so I can fetch the next 15 links and so on.

Does anybody know how to help? Thanks already!

EDIT:

My code looks like this now:

def getZooverLinks(country):
    zooverWeb = "http://www.zoover.nl/"
    url = zooverWeb + country
    parsedZooverWeb = parseURL(url)
    driver = webdriver.Firefox()
    driver.get(url)

    button = driver.find_element_by_class_name("next")
    links = []
    for page in xrange(1,3):
        for item in parsedZooverWeb.find_all(attrs={'class': 'blue2'}):
            for link in item.find_all('a'):
                newLink = zooverWeb + link.get('href')
                links.append(newLink)
        button.click()'

and I get the following error:

selenium.common.exceptions.StaleElementReferenceException: Message: Element is no longer attached to the DOM Stacktrace: at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:8956) at Utils.getElementAt (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:8546) at fxdriver.preconditions.visible (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:9585) at DelayedCommand.prototype.checkPreconditions_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12257) at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12274) at DelayedCommand.prototype.executeInternal_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12279) at DelayedCommand.prototype.execute/< (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/[email protected]/components/command-processor.js:12221)

I'm confused :/

like image 390
steph Avatar asked Apr 01 '15 07:04

steph


2 Answers

While it might be tempting to try to do this using Beautifulsoup's evaluateJavaScript method, in the end Beautifulsoup is a parser rather than an interactive web browsing client.

You should seriously consider solving this with selenium, as briefly shown in this answer. There are pretty good Python bindings available for selenium.

You could just use selenium to find the element and click it, and then pass the page on to Beautifulsoup, and use your existing code to fetch the links.

Alternatively, you could use the Javascript that's listed in the onclick handler. I pulled this from the source: EntityQuery('Ns=pPopularityScore%7c1&No=30&props=15292&dims=530&As=&N=0+3+10500915');. The No parameter increments with 15 for each page, but the props has me guessing. I'd recommend not getting into this, though, and just interact with the website as a client would, using selenium. That's much more robust to changes on their side, as well.

like image 139
Joost Avatar answered Oct 13 '22 00:10

Joost


I tried the following code and was able to load next page. Hope this will help you too. Code:

from selenium import webdriver
import os
chromedriver = "C:\Users\pappuj\Downloads\chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
url='http://www.zoover.nl/cyprus'
driver.get(url)
driver.find_element_by_class_name('next').click()

Thanks

like image 41
user4901185 Avatar answered Oct 13 '22 00:10

user4901185