Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform a dataframe into a tree structure list of lists

I have a data.frame with two columns representing a hierarchical tree, with parents and nodes.

I want to transform its structure in a way that I can use as an input for the function d3tree, from d3Network package.

Here's my data frame:

df <- data.frame(c("Canada","Canada","Quebec","Quebec","Ontario","Ontario"),c("Quebec","Ontario","Montreal","Quebec City","Toronto","Ottawa"))
names(df) <- c("parent","child")

And I want to transform it to this structure

Canada_tree <- list(name = "Canada", children = list(
                                                list(name = "Quebec", 
                children = list(list(name = "Montreal"),list(name = "Quebec City"))),
                                                 list(name = "Ontario", 
                children = list(list(name = "Toronto"),list(name = "Ottawa")))))

I have succesfully transformed this particular case using this code below:

fill_list <- function(df,node) node <- as.character(node)if (is.leaf(df,node)==TRUE){
    return (list(name = node))
  }
  else {
    new_node = df[df[,1] == node,2]

    return (list(name = node, children =  list(fill_list(df,new_node[1]),fill_list(df,new_node[2]))))
  }

The problem is, it only works with trees which every parent node has exactly two children. You can see I hard coded the two children (new_node[1] and new_node[2]) as inputs for my recursive function.

I'm trying to figure out a way that I could call the recursive function as many time as the parent's node children. Example:

fill_list(df,new_node[1]),...,fill_list(df,new_node[length(new_node)])

I tried these 3 possibilities but none of it worked:

First: Creating a string with all the functions and parameters and then evaluating. It return this error could not find function fill_functional(df,new_node[1]). That's because my function wasn´t created by the time I called it after all.

fill_functional <- function(df,node) {
  node <- as.character(node)
  if (is.leaf(df,node)==TRUE){
    return (list(name = node))
  }
  else {
    new_node = df[df[,1] == node,2]
    level <- length(new_node)
    xxx <- paste0("(df,new_node[",seq(level),"])")
    lapply(xxx,function(x) eval(call(paste("fill_functional",x,sep=""))))

  }
}

Second: Using a for loop. But I only got the children of my root node.

L <- list()
fill_list <- function(df,node) {
  node <- as.character(node)
  if (is.leaf(df,node)==TRUE){
    return (list(name = node))
  }
  else {
    new_node = df[df[,1] == node,2]

    for (i in 1:length(new_node)){
      L[i] <- (fill_list(df,new_node[i]))
    }

    return (list(name = node, children = L))
  }
}

Third: Creating a function that populates a list with elements that are functions, and just changing the arguments. But I wasn't able to accomplish anything interesting, and I'm afraid I'll have the same problem as I did on my first try described above.

like image 984
felipeformenti Avatar asked May 23 '14 22:05

felipeformenti


1 Answers

Here is a recursive definition:

maketreelist <- function(df, root = df[1, 1]) {
  if(is.factor(root)) root <- as.character(root)
  r <- list(name = root)
  children = df[df[, 1] == root, 2]
  if(is.factor(children)) children <- as.character(children)
  if(length(children) > 0) {
    r$children <- lapply(children, maketreelist, df = df)
    }
  r
  }

canadalist <- maketreelist(df)

That produces what you desire. This function assumes that the first column of the data.frame (or matrix) you pass in contains the parent column and the second column has the child. it also takes a root parameter which allows you to specify a starting points. It will default to the first parent in the list.

But if you really are interested in playing round with trees. The igraph package might be of interest

library(igraph)
g <- graph.data.frame(df)
plot(g)

canada tree in igraph

like image 114
MrFlick Avatar answered Oct 18 '22 21:10

MrFlick