Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parse multi-level json file in r

I have a pretty good understanding of R but am new to JSON file types and best practices for parsing. I'm having difficulties building a data frame from a raw JSON file. The JSON file (data below) is made up of repeated measure data that has multiple observations per user.

When the raw file is read into r

 jdata<-read_json("./raw.json")

It comes in as a "List of 1" with that list being user_ids. Within each user_id are further lists, like so -

jdata$user_id$`sjohnson`$date$`2020-09-25`$city

The very last position actually splits into two options - $city or $zip. At the highest level, there are about 89 users in the complete file.

My goal would be to end up with a rectangular data frame or multiple data frames that I can merge together like this - where I don't actually need the zip code.

example table

I've tried jsonlite along with tidyverse and the farthest I seem to get is a data frame with one variable at the smallest level - cities and zip codes alternating rows using this

df  <-  as.data.frame(matrix(unlist(jdata), nrow=length(unlist(jdata["users"]))))

Any help/suggestions to get closer to the table above would be much appreciated. I have a feeling I'm failing at looping it back through the different levels.

Here is an example of the raw json file structure:

 {
  "user_id": {
    "sjohnson": {
      "date": {
        "2020-09-25": {
              "city": "Denver",
              "zip": "80014"
            },
            "2020-10-01": {
              "city": "Atlanta",
              "zip": "30301"
            },
            "2020-11-04": {
              "city": "Jacksonville",
              "zip": "14001"
            }
         },
    "asmith: {
      "date": {
        "2020-10-16": {
              "city": "Cleavland",
              "zip": "34321"
        },
        "2020-11-10": {
              "City": "Elmhurst",
              "zip": "00013
            },
            "2020-11-10 08:49:36": {
              "location": null,
              "timestamp": 1605016176013
            }
          }
 
like image 896
Sescro Avatar asked May 25 '26 14:05

Sescro


1 Answers

Another (straightforward) solution doing the heavy-lifting with rrapply() in the rrapply-package:

library(rrapply)
library(dplyr)

rrapply(jdata, how = "melt") %>%
  filter(L5 == "city") %>%
  select(user_id = L2, date = L4, city = value)

#>    user_id       date         city
#> 1 sjohnson 2020-09-25       Denver
#> 2 sjohnson 2020-10-01      Atlanta
#> 3 sjohnson 2020-11-04 Jacksonville
#> 4   asmith 2020-10-16    Cleavland
#> 5   asmith 2020-11-10     Elmhurst

Data

jdata <- jsonlite::fromJSON('{
   "user_id": {
    "sjohnson": {
       "date": {
        "2020-09-25": {
           "city": "Denver",
          "zip": "80014"
        },
        "2020-10-01": {
          "city": "Atlanta",
          "zip": "30301"
         },
        "2020-11-04": {
          "city": "Jacksonville",
          "zip": "14001"
        }
       }
    },
    "asmith": {
       "date": {
         "2020-10-16": {
           "city": "Cleavland",
           "zip": "34321"
         },
        "2020-11-10": {
           "city": "Elmhurst",
           "zip": "00013"
         },
         "2020-11-10 08:49:36": {
          "location": null,
          "timestamp": 1605016176013
        }
       }
     }
   }
}')
like image 63
Joris C. Avatar answered May 28 '26 09:05

Joris C.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!