Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert multiple list elements to separate data.frame columns

Tags:

r

I am trying to convert a list to data.frame from api's json data. Using fromJSON, I get a nested list structure and I need to join this data on some other data frames.

So, the list is sort of multi-dimensional(nested). I have been trying to convert multiple elements into separate columns in data.frame since to match with other frame's structure and do joins. I am sure there is an elegant way for doing this but I don't seem to find one. In worst case, I might end up using for loop.

Any help would be appreciated!!!!

Here is the sample data to create the list:

mylist <- list(structure(list(
      categoryName = "cat1", 
      parent_categories = "parent1", 
      url = "/xyx.com/bca/"), 

      .Names = c("categoryName", "parent_categories", "url")), 

      structure(list(
      categoryName = "cat2", 
      parent_categories = c("parent2", "parent3", "parent4"), 
      url = "/abc.com/bca"), 

      .Names = c("categoryName", "parent_categories", "url"))
     )

The output I want should look like this

  categoryName parent_categories_1 parent_categories_2 parent_categories_3  url
1         cat1           parent1           NA           NA                 /xyx.com/bca/
2         cat2           parent2           parent3      parent4            /abc.com/bca

Below is what I have used but not getting the desired result, although its very close

ldply(mylist, function(x){ data.frame(x) })

     **MY CURRENT OUTPUT**

      categoryName parent_categories           url
     1         cat1           parent1 /xyx.com/bca/
     2         cat2           parent2  /abc.com/bca
     3         cat2           parent3  /abc.com/bca
     4         cat2           parent4  /abc.com/bca
like image 316
Dev Patel Avatar asked Aug 28 '13 19:08

Dev Patel


2 Answers

This seems a little more straightforward to me:

  1. melt your list
  2. Add a "time" variable to ensure unique combinations of L1 and L2 in the molten data.frame
  3. Use dcast to get your wide format data.frame

library(reshape2)
x <- melt(mylist)
x$time <- with(x, ave(L2, L1, L2, FUN = seq_along))
dcast(x, L1 ~ L2 + time, value.var="value")
#   L1 categoryName_1 parent_categories_1 parent_categories_2 parent_categories_3         url_1
# 1  1           cat1             parent1                <NA>                <NA> /xyx.com/bca/
# 2  2           cat2             parent2             parent3             parent4  /abc.com/bca
like image 36
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 29 '22 13:09

A5C1D2H2I1M1N2O1R2T1


Here's one approach but I'm sure there's a better way:

mylist2 <- lapply(lapply(mylist, unlist), function(x) {
    names(x)[names(x) == "parent_categories"] <- "parent_categories1"
    data.frame(t(x))
})

library(plyr)
rbind.fill(mylist2)

##   categoryName parent_categories1           url parent_categories2 parent_categories3
## 1         cat1            parent1 /xyx.com/bca/               <NA>               <NA>
## 2         cat2            parent2  /abc.com/bca            parent3            parent4

Explanation:

  1. I unlist each of the nested lists into a list of vectors
  2. I rename "parent_categories" to "parent_categories1" for those with just one parent category
  3. I use plyr's rbind.fill to splice it together

You can use several approaches to re-arrange the column order but that's fairly straight forward.

like image 94
Tyler Rinker Avatar answered Sep 29 '22 14:09

Tyler Rinker