Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to shake off hardcoded delay from my script

I've written a script in vba in combination with selenium to parse all the company names available in a webpage. The webpage has got lazyloading method active so there are only 20 links become visible in each scroll. If I scroll 2 times then the number of links visible are 40 and so on. There are 1000 links available in that webpage. My below script can reach the bottom of that page handling all the scroll and fetch all the names available in that webpage.

However, it is necessary to wait a certain time after each scroll for that webpage to update the content. This is where I've used hardcoded delay but the process of hardcoding thing is very inconsistent and sometimes it makes the browser quit before the completion of the whole operation.

How can I modify this portion .Wait 6000 to make it Explicit Wait instead of Hardcoded Wait.

This is what I've written so far:

Sub Getlinks()
    Dim driver As New ChromeDriver, prevlen&, curlen&
    Dim posts As Object, post As Object

    With driver
        .get "http://fortune.com/fortune500/list/"
        prevlen = .FindElementsByClass("company-title").Count

        Do
            prevlen = curlen
            .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")

            .Wait 6000  ''I like to kick out this hardcoded delay and use explicit wait in place

            Set posts = .FindElementsByClass("company-title")
            curlen = posts.Count
            If prevlen = curlen Then Exit Do
        Loop

        For Each post In posts
            R = R + 1: Cells(R, 1) = post.Text
        Next post
    End With
End Sub
like image 647
SIM Avatar asked Feb 14 '26 03:02

SIM


1 Answers

Here is a completely different approach that doesn't require using a browser, instead it submits a series of web requests. With this approach, waiting for a page to load isn't a concern.

Typically, with lazy loading pages, it will submit a new request to load up the data for the page as you scroll. If you monitor the web traffic you can spot the requests made and emulate those, I have done that below.

The result should be a list of company names, in ascending order in whatever the first sheet of Excel is.

Things you'll need:

Add References to:

  • Microsoft Scripting Runtime
  • Microsoft XML v6.0
  • Add the VBA-JSON code to your project. You can find that here

Edit

Changed the code to keep pulling data from the site, until there is no more items in the list. Thanks @Qharr for pointing this out.

Code


Public Sub SubmitRequest()
    Const baseURL As String = "http://fortune.com/api/v2/list/2358051/expand/item/ranking/asc/"

    Dim Url            As String
    Dim startingNumber As Long
    Dim j              As Long
    Dim getRequest     As MSXML2.XMLHTTP60
    Dim Json           As Object
    Dim Companies      As Object
    Dim Company        As Variant
    Dim CompanyArray   As Variant

    'Create an array to hold each company
    ReDim CompanyArray(0 To 50000)
    'Create a new XMLHTTP object so we can place a get request
    Set getRequest = New MSXML2.XMLHTTP60

    'The api seems to only support returning 100 records at a time
    'So do in batches of 100
    Do
        'Build the url, the format is something like
        '0/100, where 0 is the starting position, and 100 is the ending position
        Url = baseURL & startingNumber & "/" & startingNumber + 100

        With getRequest
            .Open "GET", Url
            .send

            'The response is a JSON object, for this code to work -
            'You'll need this code https://github.com/VBA-tools/VBA-JSON
            'What is returned is a dictionary
            Set Json = JsonConverter.ParseJson(.responseText)
            Set Companies = Json("list-items")

            'Keep checking in batches of 100 until there are no more
            If Companies.Count = 0 Then Exit Do

            'Iterate the dictionary and return the title (which is the name)
            For Each Company In Companies
                CompanyArray(j) = Company("title")
                j = j + 1
            Next

        End With
        startingNumber = startingNumber + 100
   Loop

    ReDim Preserve CompanyArray(j - 1)

    'Dump the data to the first sheet
    ThisWorkbook.Sheets(1).Range("A1:A" & j) = WorksheetFunction.Transpose(CompanyArray)

End Sub

like image 56
Ryan Wildry Avatar answered Feb 16 '26 03:02

Ryan Wildry