Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all of the records in COVID-19 Data Lake linelistrecords

Tags:

json

r

I'd like to use the https://api.c3.ai/covid/api/1/linelistrecord/fetch API but only get 2000 records back. I know that there are more than 2000 records -- how do I get them?

Here's my code in R:

library(tidyverse)
library(httr)
library(jsonlite)

resp <- POST(
  "https://api.c3.ai/covid/api/1/linelistrecord/fetch",
  body = list(
    spec = {}
  ) %>% toJSON(auto_unbox = TRUE),
  accept("application/json")
)

length(content(resp)$objs)

I get 2000 records.

like image 770
rsyoung Avatar asked Oct 23 '25 02:10

rsyoung


1 Answers

The spec you are passing in has the following optional fields, among others:

  • limit // maximum number of objects to return
  • offset // offset to use for paged reads

The default value of limit is 2000.

The fetch result that is returned has a boolean field, along with the array of objects, called hasMore, which indicates whether there are more records in the underlying data store.

You can write a loop that ends once hasMore is false. Start with an offset of 0, and limit n (say , n=2000), and then iteratively increase offset by n.

library(tidyverse)
library(httr)
library(jsonlite)

limit <- 2000
offset <- 0
hasMore <- TRUE
all_objs <- c()

while(hasMore) {
  resp <- POST(
    "https://api.c3.ai/covid/api/1/linelistrecord/fetch",
    body = list(
      spec = list(
        limit = limit,
        offset = offset,
        filter = "contains(location, 'California')" # just as an example, to cut down on the dataset
      )
    ) %>% toJSON(auto_unbox = TRUE),
    accept("application/json")
  )
  hasMore <- content(resp)$hasMore
  offset <- offset + limit
  all_objs <- c(all_objs, content(resp)$objs)
}

length(all_objs)
like image 162
GothamCityRises Avatar answered Oct 24 '25 19:10

GothamCityRises



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!