Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R list(structure(list())) to data frame

I have a JSON data source providing a list of hashes:

[
  { "a": "foo",
    "b": "sdfshk"
  },
  { "a": "foo",
    "b": "ihlkyhul"
  }
]

I use fromJSON() in the rjson package to convert that to an R data structure. It returns:

list(
  structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
  structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b"))
)

I need to get this into an R data frame, but data.frame() turns that into a single-row data frame with four columns instead of a 2x2 data frame as expected. I lack the R-fu to do the transform from one to the other, though it looks like it should be straightforward.

Bonus points:

The actual problem is a bit more complex, because the JSON data source isn't as regular as I show above. The objects it returns vary in type. That is, the field set in each can be one of a few different types:

[
  { "a": "foo",
    "b": "asdfhalsdhfla"
  },
  { "a": "bar",
    "c": "akjdhflakjhsdlfkah",
    "d": "jfhglskhfglskd",
  },
  { "a": "foo",
    "b": "dfhlkhldsfg"
  }
]

As you can see, the "a" field in each object is a type tag, indicating which other fields the object will have.

I'm not too particular how the solution copes with this.

It wouldn't be horrible if the two object types were just mooshed together, so you get columns a, b, c, and d, and the rows simply have N/A or NULL values where the JSON source object doesn't have a value for a given field. I believe I can clean the resulting data frame with subset(df, a == "foo"). I'll end up with some empty columns that way, but it won't matter to my program.

It would be better if the solution provides a way to select which JSON source rows go into the data frame and which get rejected, so the result has only the columns and rows actually required.

like image 275
Warren Young Avatar asked Sep 20 '12 08:09

Warren Young


1 Answers

If you have a jagged list you want converted to a data.frame, you could use Hadley's plyr's rbind.fill. Saved my neck on a couple of occasions. Let me know if this is what you're looking for. Notice that I modified your first example to include only "b" in the third element to make it jagged.

> x <- list(
+         structure(list(a = "foo", b = "sdfshk"), .Names = c("a", "b")),
+         structure(list(a = "foo", b = "ihlkyhul"), .Names = c("a", "b")),
+         structure(list(b = "asdf"), .Names = "b")
+ )
> 
> library(plyr)
> do.call("rbind.fill", lapply(x, as.data.frame))
     a        b
1  foo   sdfshk
2  foo ihlkyhul
3 <NA>     asdf
like image 50
Roman Luštrik Avatar answered Nov 03 '22 01:11

Roman Luštrik