I'm using package tidyjson to parse a json string and extract the key values into columns. The json in nested, and while I can drill down at a node, I can't figure out a way to go up to the previous level. The code is below:
library(tidyjson)
library(data.table)
library(dplyr)
input <- '{
"name": "Bob",
"age": 30,
"social": {
"married": "yes",
"kids": "no"
},
"work": {
"title": "engineer",
"salary": 5000
}
}'
output <- input %>% as.tbl_json() %>%
spread_values(name = jstring("name"),
age = jnumber("age")) %>%
enter_object("social") %>%
spread_values(married = jstring("married"),
kids = jstring("kids")) %>%
#### I would need an exit_obeject() here
enter_object("work") %>%
spread_values(title = jstring("title"),
salary = jnumber("salary"))
There's a note in the documentation:
"Note that there are often situations where there are multiple arrays or objects of differing types that exist at the same level of the JSON hierarchy. In this case, you need to use enter_object() to enter each of them in separate pipelines to create separate data.frames that can then be joined relationally."
As such I've been staging my tidyjson commands and putting the outputs together with merge, e.g.:
# first the high-level values
output_table <- input_tbl_json %>%
spread_values(val1 = jstring('val1'),
val2 = jnumber('val2'))
# then enter an object and get something from inside, merging it as a new column
output_table <- merge(output_table,
input_tbl_json %>%
enter_object('thing') %>%
spread_values(val3 = jstring('thing1')),
by = c('document.id'))
output table columns should look like | document.id | val1 | val2 | val3 |
That workflow may fall over with operations like gather_keys() that add rows, but I haven't had call to test it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With