How to load lazy content on Linkedin search page using selenium

Summary

I am trying to scrape all first connections' profile links of an account on LinkedIn search page. But since the page loads the rest of the content dynamically (as you scroll down) I can not get the 'Next' page button which is at the end of the page.

Problem description

https://linkedin.com/search/results/people/?facetGeoRegion=["tr%3A0"]&facetNetwork=["F"]&origin=FACETED_SEARCH&page=YOUR_PAGE_NUMBER

I can navigate to the search page using selenium and the link above. I want to know how many pages there are to navigate them all just changing the page= variable of the link above.

To implement that I wanted to check for the existence of Next button. As long as there is next button I would request the next page for scraping. But if you do not scroll down till the bottom of the page -which is where the 'Next' button is- you can not find the Next button nor you can find the information about other profiles because they are not loaded yet.

Here is how it looks when you do not scroll down and take a screenshot of the whole page using firefox screenshot tool.

How I implemented

I can fix this by hard coding a scroll down action into my code and making the driver wait for visibilityOfElementLocated. But I was wondering whether there is any other way better than my approach. And if by the approach the driver can not find the Next button somehow the program exits with the exit code 1.

And when I inspect the requests when I scroll down the page, it is just requests for images and etc as you can see below. I couldn't figure out how the page loads more info about profiles as I scroll down the page.

networks inspection

Source code

Here is how I implemented it in my code. This app is just a simple implementation which is trying to find the Next button on the page.

package com.andreyuhai;

import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;

public class App 
{
    WebDriver driver;

    public static void main( String[] args )
    {
        Bot bot = new Bot("firefox", false, false, 0, 0, null, null, null);

        int pagination = 1;

        bot.get("https://linkedin.com");
        if(bot.attemptLogin("username", "pw")){
            bot.get("https://www.linkedin.com/" +
                    "search/results/people/?facetGeoRegion=" +
                    "[\"tr%3A0\"]&origin=FACETED_SEARCH&page=" + pagination);


            JavascriptExecutor js = (JavascriptExecutor) bot.driver;

            js.executeScript("scrollBy(0, 2500)");

            WebDriverWait wait = new WebDriverWait(bot.driver, 10);
            wait.until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//button[@class='next']/div[@class='next-text']")));

            WebElement nextButton = bot.driver.findElement(By.xpath("//button[@class='next']/div[@class='next-text']"));


            if(nextButton != null ) {
                System.out.println("Next Button found");
                nextButton.click();
            }else {
                System.out.println("Next Button not found");
            }
        }
    }
}

Another tool for that which I wonder about : LinkedIn Spider

There is this chrome extension called linkedIn Spider

This also does exactly what I am trying to achieve but using JavaScript I guess, I am not sure. But when I run this extension on the same search page. This does not do any scrolling down or loading other pages one by one extract the data.

So my questions are:

Could you please explain me how LinkedIn achieves this? I mean how does it load profile information as I scroll down if not making any request or etc. I really don't know about this. I would appreciate any source links or explanations.
Do you have any better (faster I mean) idea to implement what I am trying to implement?
Could you please explain me how LinkedIn Spider could be working without scrolling down and etc.

611

asked Jan 06 '19 17:01

Burak Kaymakci

1 Answers

I have checked the div structure and the way linkedin is showing the results. So, if you hit the url directly and check the by following xpath: //li[contains(@class,'search-result')] You would find out that all the results are already loaded on the page, but linkedin are showing only 5 results in one go and on scrolling, it shows the next 5 results, however all the results are already loaded on the page and can be found out by the mentioned xpath.

Refer to this image which highlights the div structure and results when you find the results on entering the xpath on hitting the url: https://imgur.com/Owu4NPh and
Refer to this image which highlights the div structure and results after scrolling the page to the bottom and then finding the results using the same xpath: https://imgur.com/7WNR830

You could see the result set is same however there is an additional search-result__occlusion-hint part in the < li > tag in the last 5 results and through this linkedin is hiding the next 5 results and showing only the first 5 results on the first go.

Now comes the implementation part, i have checked "Next" button comes only when you scroll through whole results on the page, so instead of scrolling to a definite coordinates because that can be changed for different screensizes and windows, you can take the results in a list of webelement and get it's size and then scroll to the last element of that list. In this case, if there are total 10 results then the page will be scrolled to the 10th results and if there are only 4 results then the page will be scrolled to the 4th result and after scrolling you can check if the Next button is present on the page or not. For this, you can check the list size of the "Next" button web element list, if the list size is greater than 0, it means the next button is present on the page and if its not greater than 0, that means the Next button is not present on the list and you can stop your execution there.

So to implement it, i have taken a boolean which has an initial value as true and the code will be run in a loop till that boolean becomes false and it will become false when the Next button list size becomes equal to 0.

Please refer to the code below:

public class App 
{    
    WebDriver driver;

  // For initialising javascript executor
  public Object executeScript(String script, Object... args) {
    JavascriptExecutor exe = (JavascriptExecutor) driver;
    return exe.executeScript(script, args);
  }

  // Method for scrolling to the element
  public void scrollToElement(WebElement element) {
    executeScript("window.scrollTo(arguments[0],arguments[1])", element.getLocation().x, element.getLocation().y);

    }

  public static void main(String[] args) {
    // You can change the driver to bot according to your usecase
    driver = new FirefoxDriver();
    // Add your direct URL here and perform the login after that, if necessary
    driver.get(url);
    // Wait for the URL to load completely
    Thread.sleep(10000);
    // Initialising the boolean
    boolean nextButtonPresent = true;
    while (nextButtonPresent) {
        // Fetching the results on the page by the xpath
        List<WebElement> results = driver.findElements(By.xpath("//li[contains(@class,'search-result')]"));
        // Scrolling to the last element in the list
        scrollToElement(results.get(results.size() - 1));
        Thread.sleep(2000);

        // Checking if next button is present on the page
        List<WebElement> nextButton = driver.findElements(By.xpath("//button[@class='next']"));
        if (nextButton.size() > 0) {
            // If yes then clicking on it
            nextButton.get(0).click();
            Thread.sleep(10000);
        } else {
            // Else setting the boolean as false
            nextButtonPresent = false;
            System.out.println("Next button is not present, so ending the script");
        }
      }
   }
}

158

answered Oct 21 '22 18:10

Sameer Arora

Related questions
                            
                                Why I'm getting Stream<Object> when I call stream() after collect()?
                            
                                Do Java methods headSet and tailSet in Java class TreeSet work in log(N) time?
                            
                                Java Streams TakeUntil 100 Elements filtered/collected
                            
                                IBM java POST API Throws an SSL HandShake Exception
                            
                                how to identify the java version in which the code is written?
                            
                                How to check if any of multiple elements are in a List in a convenient way?
                            
                                Jhipster entity sub generator: How to create liquibase DELTA changelogs?
                            
                                generate any random number of any Length in Java
                            
                                Setting A Relative Path for a Keystore File
                            
                                Unable to connect to Kafka run in container from Spring Boot app run outside container
                            
                                Lambda expression and Optional how to return String value
                            
                                How to create FusedLocationProviderClient in a Service that can run untill stop by user?
                            
                                Unable to create SOAP connection factory: Provider com.sun.xml.internal.messaging.saaj.client.p2p.HttpSOAPConnectionFactory not found
                            
                                Null in functional interface with different type return
                            
                                Private Sorting Rule in a Stream Java
                            
                                Why does Spark's Word2Vec return a vector?
                            
                                Check which combinations of parameters are null in Java
                            
                                Why Spring Security permitAll() is not working with OAuth2.0?
                            
                                Java Card Object Instance in Transient Memory
                            
                                How to remove partially installed sdk in android studio

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to load lazy content on Linkedin search page using selenium

Tags:

java

selenium

selenium-webdriver

web-scraping