Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

web scraping a webpage which has dynamic contents loaded via ajax

Say I wish to scrape products on this page(http://shop.coles.com.au/online/national/bread-bakery/fresh/bread#pageNumber=2&currentPageSize=20)

But the products is loaded from a post request. A lot of posts here suggest to simulate a request to get dynamic contents, but in my case the Form Data is unknown for me, i.e. catalogId, categoryId.

I'm wondering is it possible to get the response after the ajax call is finished?

like image 337
Harrison Avatar asked Oct 31 '22 01:10

Harrison


1 Answers

You can get the catalogId and other parameter values needed to make the POST request from the form with id="search":

<form id="search" name="search" action="http://shop.coles.com.au/online/SearchDisplay?pageView=image&amp;catalogId=10576&amp;beginIndex=0&amp;langId=-1&amp;storeId=10601" method="get" role="search">
    <input type="hidden" name="storeId" value="10601" id="WC_CachedHeaderDisplay_FormInput_storeId_In_CatalogSearchForm_1">
    <input type="hidden" name="catalogId" value="10576" id="WC_CachedHeaderDisplay_FormInput_catalogId_In_CatalogSearchForm_1">
    <input type="hidden" name="langId" value="-1" id="WC_CachedHeaderDisplay_FormInput_langId_In_CatalogSearchForm_1">
    <input type="hidden" name="beginIndex" value="0" id="WC_CachedHeaderDisplay_FormInput_beginIndex_In_CatalogSearchForm_1">
    <input type="hidden" name="browseView" value="false" id="WC_CachedHeaderDisplay_FormInput_browseView_In_CatalogSearchForm_1">
    <input type="hidden" name="searchSource" value="Q" id="WC_CachedHeaderDisplay_FormInput_searchSource_In_CatalogSearchForm_1">
    ...
</form>

Use the FormRequest to submit this form.


I'm wondering is it possible to get the response after the ajax call is finished?

Scrapy is not a browser - it does not make additional AJAX requests to load the page and there is nothing built-in to execute JavaScript. You may look into using a real browser and solve it on a higher level - look into selenium package. There is also the related scrapy-splash project.

See also:

  • selenium with scrapy for dynamic page
like image 197
alecxe Avatar answered Nov 10 '22 00:11

alecxe