Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Generic flattening of JSON to data.frame

This question is about a generic mechanism for converting any collection of non-cyclical homogeneous or heterogeneous data structures into a dataframe. This can be particularly useful when dealing with the ingestion of many JSON documents or with a large JSON document that is an array of dictionaries.

There are several SO questions that deal with manipulating deeply nested JSON structures and turning them into dataframes using functionality such as plyr, lapply, etc. All the questions and answers I have found are about specific cases as opposed to offering a general approach for dealing with collections of complex JSON data structures.

In Python and Ruby I've been well-served by implementing a generic data structure flattening utility that uses the path to a leaf node in a data structure as the name of the value at that node in the flattened data structure. For example, the value my_data[['x']][[2]][['y']] would appear as result[['x.2.y']].

If one has a collection of these data structures that may not be entirely homogeneous the key to doing a successful flattening to a dataframe would be to discover the names of all possible dataframe columns, e.g., by taking the union of all keys/names of the values in the individually flattened data structures.

This seems like a common pattern and so I'm wondering whether someone has already built this for R. If not, I'll build it but, given R's unique promise-based data structures, I'd appreciate advice on an implementation approach that minimizes heap thrashing.

like image 762
Sim Avatar asked Jul 19 '12 03:07

Sim


People also ask

How do I flatten a JSON object?

Flatten a JSON object: var flatten = (function (isArray, wrapped) { return function (table) { return reduce("", {}, table); }; function reduce(path, accumulator, table) { if (isArray(table)) { var length = table.

Should I flatten JSON?

There is no need to "flatten" JSON as described in your link. (In fact, it's somewhat contrary to JSON "philosophy".) Sometimes JSON is poorly constructed, with extra layers of "object" that are unnecessary, but the referenced example is not that case.

How do I flatten data in R?

Go to Anything > Data > Variables > New > Ready-Made New Variable(s) > Flatten Variable Set(s).

What does flattening a JSON mean?

Data flattening usually refers to the act of flattening semi-structured data, such as name-value pairs in JSON, into separate columns where the name becomes the column name that holds the values in the rows. Data unflattening is the opposite; adding nested structure to relational data.


1 Answers

The jsonlite package is a fork of RJSONIO specifically designed to make conversion between JSON and data frames easier. You don't provide any example json data, but I think this might be what you are looking for. Have a look at this blog post or the vignette.

like image 127
Jeroen Ooms Avatar answered Sep 21 '22 19:09

Jeroen Ooms