Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Loop calls to Pagination URL in C# HttpClient to download all Pages from JSON results

My 1st question, so please be kind... :)

I'm using the C# HttpClient to invoke Jobs API Endpoint.

Here's the endpoint: Jobs API Endpoint (doesn't require key, you can click it)

This gives me JSON like so.

{
  "count": 1117,
  "firstDocument": 1,
  "lastDocument": 50,
  "nextUrl": "\/api\/rest\/jobsearch\/v1\/simple.json?areacode=&country=&state=&skill=ruby&city=&text=&ip=&diceid=&page=2",
  "resultItemList": [
    {
      "detailUrl": "http:\/\/www.dice.com\/job\/result\/90887031\/918715?src=19",
      "jobTitle": "Sr Security Engineer",
      "company": "Accelon Inc",
      "location": "San Francisco, CA",
      "date": "2017-03-30"
    },
    {
      "detailUrl": "http:\/\/www.dice.com\/job\/result\/cybercod\/BB7-13647094?src=19",
      "jobTitle": "Platform Engineer - Ruby on Rails, AWS",
      "company": "CyberCoders",
      "location": "New York, NY",
      "date": "2017-04-16"
    }
 ]
}

I've pasted a complete JSON snippet so you can use it in your answer. The full results are really long for here.

Here's are the C# classes.

using Newtonsoft.Json;
using System.Collections.Generic;

namespace MyNameSpace
{
    public class DiceApiJobWrapper
    {
        public int count { get; set; }
        public int firstDocument { get; set; }
        public int lastDocument { get; set; }
        public string nextUrl { get; set; }

        [JsonProperty("resultItemList")]
        public List<DiceApiJob> DiceApiJobs { get; set; }
    }

    public class DiceApiJob
    {
        public string detailUrl { get; set; }
        public string jobTitle { get; set; }
        public string company { get; set; }
        public string location { get; set; }
        public string date { get; set; }
    }
}

When I invoke the URL using HttpClient and deserialize using JSON.NET, I do get the data back properly.

Here's the code I am calling from my Console App's Main method (hence the static list, I think this could be better refactored??)

   private static List<DiceApiJob> GetDiceJobs()
    {
        HttpClient httpClient = new HttpClient();
        var jobs = new List<DiceApiJob>();

        var task = httpClient.GetAsync("http://service.dice.com/api/rest/jobsearch/v1/simple.json?skill=ruby")
          .ContinueWith((taskwithresponse) =>
          {
              var response = taskwithresponse.Result;
              var jsonString = response.Content.ReadAsStringAsync();
              jsonString.Wait();

              var result =  JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString.Result);
              if (result != null)
              {
                  if (result.DiceApiJobs.Any())
                      jobs = result.DiceApiJobs.ToList();

                  if (result.nextUrl != null)
                  {
                      //
                      // do this GetDiceJobs again in a loop? How?? Any other efficient elegant way??
                  }
              }
          });
        task.Wait();

        return jobs;
    }

But now, how do I check if there are more jobs using the nextUrl field? I know I can check to see if it's not null, and if if not, that means there are more jobs to pull down.

Results from my debugging and stepping through

How do I do this recursively, and without hanging and with some delays so I don't cross the API limits? I think I have to use TPL ( Task Parallel Library) but am quite baffled.

Thank you! ~Sean

like image 791
SeanPatel Avatar asked Apr 17 '17 05:04

SeanPatel


1 Answers

If you are concerned about response time of your app and would like to return some results before you actually get all pages/data from the API, you could run your process in a loop and also give it a callback method to execute as it gets each page of data from the API.

Here is a sample:

public class Program
{
    public static void Main(string[] args)
    {
        var jobs = GetDiceJobsAsync(Program.ResultCallBack).Result;
        Console.WriteLine($"\nAll {jobs.Count} jobs displayed");
        Console.ReadLine();
    }

    private static async Task<List<DiceApiJob>> GetDiceJobsAsync(Action<DiceApiJobWrapper> callBack = null)
    {
        var jobs = new List<DiceApiJob>();
        HttpClient httpClient = new HttpClient();
        httpClient.BaseAddress = new Uri("http://service.dice.com");
        var nextUrl = "/api/rest/jobsearch/v1/simple.json?skill=ruby";

        do
        {
            await httpClient.GetAsync(nextUrl)
                .ContinueWith(async (jobSearchTask) =>
                {
                    var response = await jobSearchTask;
                    if (response.IsSuccessStatusCode)
                    {
                        string jsonString = await response.Content.ReadAsStringAsync();
                        var result = JsonConvert.DeserializeObject<DiceApiJobWrapper>(jsonString);
                        if (result != null)
                        {
                            // Build the full list to return later after the loop.
                            if (result.DiceApiJobs.Any())
                                jobs.AddRange(result.DiceApiJobs.ToList());

                            // Run the callback method, passing the current page of data from the API.
                            if (callBack != null)
                                callBack(result);

                            // Get the URL for the next page
                            nextUrl = (result.nextUrl != null) ? result.nextUrl : string.Empty;
                        }
                    }
                    else
                    {
                        // End loop if we get an error response.
                        nextUrl = string.Empty;
                    }
                });                

        } while (!string.IsNullOrEmpty(nextUrl));
        return jobs;
    }


    private static void ResultCallBack(DiceApiJobWrapper jobSearchResult)
    {
        if (jobSearchResult != null && jobSearchResult.count > 0)
        {
            Console.WriteLine($"\nDisplaying jobs {jobSearchResult.firstDocument} to {jobSearchResult.lastDocument}");
            foreach (var job in jobSearchResult.DiceApiJobs)
            {
                Console.WriteLine(job.jobTitle);
                Console.WriteLine(job.company);
            }
        }
    }
}

Note that the above sample allows the callback method to access each page of data as it is received by the GetDiceJobsAsync method. In this case, the console, displays each page as it becomes available. If you do not want the callback option, you can simply pass nothing to GetDiceJobsAsync.

But the GetDiceJobsAsync also returns all the jobs when it completes. So you can choose to act on the whole list at the end of GetDiceJobsAsync.

As for reaching API limits, you can insert a small delay within the loop, right before you repeat the loop. But when I tried it, I did not encounter the API limiting my requests so I did not include it in the sample.

like image 129
Frank Fajardo Avatar answered Nov 17 '22 23:11

Frank Fajardo