Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scraping a web flyer

I am Scraping a web flyer

https://flipp.com/flyers/groceries

POSTAL CODE N2L2A1

however, though all the items are coded in HTML with same tags and class names, I am unable to scrape everything in selenium.

I have tried find_elements method but still can scrape only one value, conceptually it shouldn't be the case.

since I am using common class name, everything should be listed.

That's not the case here. Am I missing something?

My code looks like this:

driver.find_elements_by_xpath("//html/body/flipp-dialog/div/flipp-toast-container/div/flipp-item-dialog/div/h2/span")
like image 816
san guine Avatar asked Feb 02 '26 22:02

san guine


1 Answers

Xpath you chose is a problem.

  1. Usually you should avoid absolute xpath. Choose relative xpath instead, starting from the item that actually has a meaning for you.
  2. Make sure you select an element, which actually exists on page. Sometimes it means you need to hover or click on something, before that element is available. For instance when I searched for flipp-toast-container on that page you provided in the comment, all it has is:

    <flipp-toast-container global=""><flipp-toast></flipp-toast><div class="toastable-content"></div></flipp-toast-container>
    

    So it won't select anything meaningful.

So if you want to select each and every flyer name on page that lists them (e.g. flipp.com/flyers/groceries), use xpath like this:

//flipp-flyer-listing-item//p[@class="flyer-name"]

(i.e.: we select flipp-flyer-listing-item, which represents container for each flyer; and choose p with attribute @class=flyer-name; we skip any levels in between with //, since it's only important to locate those 2 items in relation to each other, it doesn't matter where they are on page.

If your goal is to actually scrape contents of each flyer (page like https://flipp.com/flyer/1352064-zehrs-weekly-flyer), you will have to navigate to contents of the flyer first, and then choose each item as:

//flipp-flyerview//a[@class="item-container"]/div

Note: there are also methods other than xpath to choose the item, and I'm leaving aside navigation topic, since it's not part of the question.

like image 179
ytrewq Avatar answered Feb 05 '26 12:02

ytrewq