Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transposing JSON list-of-dictionaries for analysis in R

Tags:

json

r

I have experimental data expressed as dicts of key-value pairs for each experiment. A set of related experiments is serialized as a list of these dicts in JSON. This is parseable in in R via the rjson package, but the data is loaded in a form which is challenging to analyze

data <- fromJSON('[{"k1":"v1","k2":"v2"}, {"k1":"v3","k2":"v4"}]')

yields

[[1]]
[[1]]$k1
[1] "v1"

[[1]]$k2
[1] "v2"


[[2]]
[[2]]$k1
[1] "v3"

[[2]]$k2
[1] "v4"

Attempting to turn this into a data.frame directly with as.data.frame(data) yields:

  k1 k2 k1.1 k2.1
1 v1 v2   v3   v4

clearly viewing the the sequence of key/value pairs across all experiments as a flat 1-dimensional list.

What I want is a more conventional table with a row for each experiment, and a column for each unique key:

  k1 k2
1 v1 v2
2 v3 v4

How can I cleanly express this transform in R?

like image 422
jrk Avatar asked Feb 14 '10 04:02

jrk


1 Answers

The l*ply functions can be your best friend when doing with list processing. Try this:

> library(plyr)
> ldply(data, data.frame)
  k1 k2
1 v1 v2
2 v3 v4

plyr does some very nice processing behind the scenes to deal with things like irregular lists (e.g. when each list doesn't contain the same number of elements). This is very common with JSON and XML, and is tricky to handle with the base functions.

Or alternatively using base functions:

> do.call("rbind", lapply(data, data.frame))

You can use rbind.fill (from plyr) instead of rbind if you have irregular lists, but I'd advise just using plyr from the beginning to make your life easier.

Edit:

Regarding your more complicated example, using Hadley's suggestion deals with this easily:

> x<-list(list(k1=2,k2=3),list(k2=100,k1=200),list(k1=5, k3=9))
> ldply(x, data.frame)
   k1  k2 k3
1   2   3 NA
2 200 100 NA
3   5  NA  9
like image 52
Shane Avatar answered Oct 16 '22 23:10

Shane