Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Click display button in Scrapy-Splash

I am scraping the following webpage using scrapy-splash, http://www.starcitygames.com/buylist/, which I have to login to, to get the data I need. That works fine but in order to get the data I need to click the display button so I can scrape that data, the data I need is not accessible until the button is clicked. I already got an answer to this that told me I cannot simply click the display button and scrape the data that shows up and that I need to scrape the JSON webpage associated with that information but I am concerned that scraping the JSON instead will be a red flag to the owners of the site since most people do not open the JSON data page and it would take a human several minutes to find it versus the computer which would be much faster. So I guess my question is, is there anyway to scrape the webpage my clicking display and going from there or do I have no choice but to scrape the JSON page? This is what I have got so far... but it is not clicking the button.

import scrapy
from ..items import NameItem

class LoginSpider(scrapy.Spider):
    name = "LoginSpider"
    start_urls = ["http://www.starcitygames.com/buylist/"]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
        response,
        formcss='#existing_users form',
        formdata={'ex_usr_email': '[email protected]', 'ex_usr_pass': 'password'},
        callback=self.after_login
        )



    def after_login(self, response):
        item = NameItem()
        display_button = response.xpath('//a[contains(., "Display>>")]/@href').get()

        yield response.follow(display_button, self.parse)

        item["Name"] = response.css("div.bl-result-title::text").get()
        return item

Snapshot of website HTML COde

like image 967
Tim Avatar asked Jun 25 '19 16:06

Tim


People also ask

How do you click a button in Scrapy Python?

You cannot click a button with Scrapy. You can send requests & receive a response. It's upto you to interpret the response with a separate javascript engine.

What is Scrapy splash selenium?

Selenium is only used to automate web browser interaction, Scrapy is used to download HTML, process data and save it(whole web crawling framework). Talking about scraping I would recommend scrapy and if the problem is javascript. Scrapy already has its own official project for javascript called scrapy-splash.


2 Answers

You can use the developer tools of your browser to track the request of that click event, which is in a nice JSON format, also no need for cookie (login):

http://www.starcitygames.com/buylist/search?search-type=category&id=5061

The only thing need to fill is the category_id related to this request, this can be extracted from the HTML and declared in your code.

Category name:

//*[@id="bl-category-options"]/option/text()

Category id:

//*[@id="bl-category-options"]/option/@value

Working with JSON is much more simple than parsing HTML.

like image 167
Kamoo Avatar answered Oct 14 '22 21:10

Kamoo


I have tried to emulate the click with scrapy-splash, making use of lua script. It works, you just have to integrate it with scrapy and to manipulate the content. I leave the script, in which I finish integrating it with scrapy.

function main(splash)
  local url = 'https://www.starcitygames.com/login'
  assert(splash:go(url))
  assert(splash:wait(0.5))
  assert(splash:runjs('document.querySelector("#ex_usr_email_input").value = "[email protected]"'))
  assert(splash:runjs('document.querySelector("#ex_usr_pass_input").value = "your_password"'))
  splash:wait(0.5)
  assert(splash:runjs('document.querySelector("#ex_usr_button_div button").click()'))
  splash:wait(3)
  splash:go('https://www.starcitygames.com/buylist/')
  splash:wait(2)
  assert(splash:runjs('document.querySelectorAll(".bl-specific-name")[1].click()'))
  splash:wait(1)
  assert(splash:runjs('document.querySelector("#bl-search-category").click()'))
  splash:wait(3)
  splash:set_viewport_size(1200,2000)
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

enter image description here

like image 25
GmrYael Avatar answered Oct 14 '22 22:10

GmrYael