Click display button in Scrapy-Splash

Tags:

I am scraping the following webpage using scrapy-splash, http://www.starcitygames.com/buylist/, which I have to login to, to get the data I need. That works fine but in order to get the data I need to click the display button so I can scrape that data, the data I need is not accessible until the button is clicked. I already got an answer to this that told me I cannot simply click the display button and scrape the data that shows up and that I need to scrape the JSON webpage associated with that information but I am concerned that scraping the JSON instead will be a red flag to the owners of the site since most people do not open the JSON data page and it would take a human several minutes to find it versus the computer which would be much faster. So I guess my question is, is there anyway to scrape the webpage my clicking display and going from there or do I have no choice but to scrape the JSON page? This is what I have got so far... but it is not clicking the button.

import scrapy
from ..items import NameItem

class LoginSpider(scrapy.Spider):
    name = "LoginSpider"
    start_urls = ["http://www.starcitygames.com/buylist/"]

    def parse(self, response):
        return scrapy.FormRequest.from_response(
        response,
        formcss='#existing_users form',
        formdata={'ex_usr_email': '[email protected]', 'ex_usr_pass': 'password'},
        callback=self.after_login
        )



    def after_login(self, response):
        item = NameItem()
        display_button = response.xpath('//a[contains(., "Display>>")]/@href').get()

        yield response.follow(display_button, self.parse)

        item["Name"] = response.css("div.bl-result-title::text").get()
        return item

Snapshot of website HTML COde

967

asked Jun 25 '19 16:06

Tim

2 Answers

You can use the developer tools of your browser to track the request of that click event, which is in a nice JSON format, also no need for cookie (login):

http://www.starcitygames.com/buylist/search?search-type=category&id=5061

The only thing need to fill is the category_id related to this request, this can be extracted from the HTML and declared in your code.

Category name:

//*[@id="bl-category-options"]/option/text()

Category id:

//*[@id="bl-category-options"]/option/@value

Working with JSON is much more simple than parsing HTML.

167

answered Oct 14 '22 21:10

Kamoo

I have tried to emulate the click with scrapy-splash, making use of lua script. It works, you just have to integrate it with scrapy and to manipulate the content. I leave the script, in which I finish integrating it with scrapy.

function main(splash)
  local url = 'https://www.starcitygames.com/login'
  assert(splash:go(url))
  assert(splash:wait(0.5))
  assert(splash:runjs('document.querySelector("#ex_usr_email_input").value = "[email protected]"'))
  assert(splash:runjs('document.querySelector("#ex_usr_pass_input").value = "your_password"'))
  splash:wait(0.5)
  assert(splash:runjs('document.querySelector("#ex_usr_button_div button").click()'))
  splash:wait(3)
  splash:go('https://www.starcitygames.com/buylist/')
  splash:wait(2)
  assert(splash:runjs('document.querySelectorAll(".bl-specific-name")[1].click()'))
  splash:wait(1)
  assert(splash:runjs('document.querySelector("#bl-search-category").click()'))
  splash:wait(3)
  splash:set_viewport_size(1200,2000)
  return {
    html = splash:html(),
    png = splash:png(),
    har = splash:har(),
  }
end

enter image description here

answered Oct 14 '22 22:10

GmrYael

Related questions
                            
                                What format to export pandas dataframe while retaining data types? Not CSV; Sqlite? Parquet?
                            
                                PySpark Dataframe melt columns into rows
                            
                                Building SVM with tensorflow's LinearClassifier and Panda's Dataframes
                            
                                Pandas DataFrame Get Header Names based on values
                            
                                Convert Google-Ads API GoogleAdsRow to json?
                            
                                Reading numpy ndarrays into R?
                            
                                Importing COCO datasets to google colaboratory
                            
                                How to configure a decorator in Python
                            
                                python asyncio exceptions raised from loop.create_task()
                            
                                django.db.utils.ProgrammingError: syntax error at or near "WITH ORDINALITY" LINE 6:
                            
                                I need to generate 1000 unique first name In Python
                            
                                Fitting a Logistic Curve to Data
                            
                                Inconsistent behavior concatenating lists and tuples in python
                            
                                Pytest mocker patch Attribute:Error 'function' object has no attribute 'patch'
                            
                                How to fix "ssl module in Python is not available" in CentOs
                            
                                Various list concatenation method and their performance
                            
                                Why does a pandas dataframe consumes much more RAM than the size of the original text file?
                            
                                Separate string from numeric in single Pandas Dataframe column and create two new columns
                            
                                matplotlib geopandas plot chloropleth with set bins for colorscheme
                            
                                How to remove gradient background noise?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Click display button in Scrapy-Splash

Tags:

python

splash-screen

web-scraping

scrapy

scrapy-splash

Tim

People also ask

2 Answers

Kamoo

GmrYael

Recent Activity

Donate For Us