Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a dendrogram from a directory tree?

Tags:

r

dendrogram

Given a root absolute directory path. How do I generate a dendrogram object of all path's below it so that I can visualize the directory tree with R?

Suppose the following call returned the following leaf nodes.

list.files(path, full.names = TRUE, recursive = TRUE)

root/a/some/file.R root/a/another/file.R root/a/another/cool/file.R root/b/some/data.csv root/b/more/data.csv 

I'd like to make a plot in R like the output of the unix tree program:

root ├── a │   ├── another │   │   ├── cool │   │   │   └── file.R │   │   └── file.R │   └── some │       └── file.R └── b     ├── more     │   └── data.csv     └── some         └── data.csv 

It would be especially useful if the solution involved decomposing the file system tree into two data.frame's:

  1. a table of nodes (with which I could include attributes such as modification date)
  2. and a table of edges (also with attributes)

And then building the dendrogram object from those two data.frames.

like image 722
wdkrnls Avatar asked Mar 18 '16 21:03

wdkrnls


People also ask

How do I create a directory tree?

Was this reply helpful? To display the folder hierarchy, open Windows Explorer, navigate to the folder you wish to start at, hold down the Shift key, right-click on the folder name and choose Open command window here. Type tree |clip and press Enter.

How do I view a directory tree?

Navigate into the folder in file explorer. Press Shift, right-click mouse, and select "Open command window here". Type tree /f /a > tree. txt and press Enter.

What is a directory tree generator?

Your directory tree generator tool will run on the command line. It'll take arguments, process them, and display a directory tree diagram on the terminal window. It can also save the output diagram to a file in markdown format.

What is directory tree structure?

A directory structure/system/tree is simply a layout of directories on your computer. Taking a big step back, the early computer designers realized that lumping together every single file on your computer would create a massive jumble and make it impossible to find anything. So they wisely created the directory.


2 Answers

Here's a possible approach to get what you originally asked for which is a system like tree. This will give a data.tree object that's pretty flexible and could be made to plot like you might want but it's not entirely clear to me what you want:

path <- c(     "root/a/some/file.R",      "root/a/another/file.R",      "root/a/another/cool/file.R",      "root/b/some/data.csv",      "root/b/more/data.csv" )   library(data.tree); library(plyr)  x <- lapply(strsplit(path, "/"), function(z) as.data.frame(t(z))) x <- rbind.fill(x) x$pathString <- apply(x, 1, function(x) paste(trimws(na.omit(x)), collapse="/")) (mytree <- data.tree::as.Node(x))  1  root                   2   ¦--a                  3   ¦   ¦--some           4   ¦   ¦   °--file.R     5   ¦   °--another        6   ¦       ¦--file.R     7   ¦       °--cool       8   ¦           °--file.R 9   °--b                  10      ¦--some           11      ¦   °--data.csv   12      °--more           13          °--data.csv     plot(mytree) 

You can get the parts you want (I think) but it'll require you to do the leg work and figure out conversion between data types in data.tree: https://cran.r-project.org/web/packages/data.tree/vignettes/data.tree.html#tree-conversion

I use this approach in my pathr package's tree function when use.data.tree = TRUE https://github.com/trinker/pathr#tree

EDIT Per@Luke's comment below...data.tree::as.Node takes a path directly:

(mytree <- data.tree::as.Node(data.frame(pathString = path)))                  levelName 1  root2                  2   ¦--a                  3   ¦   ¦--some           4   ¦   ¦   °--file.R     5   ¦   °--another        6   ¦       ¦--file.R     7   ¦       °--cool       8   ¦           °--file.R 9   °--b                  10      ¦--some           11      ¦   °--data.csv   12      °--more           13          °--data.csv   
like image 64
Tyler Rinker Avatar answered Oct 08 '22 12:10

Tyler Rinker


It's worth adding that excellent fs package offers dir_tree function that delivers this functionality to R in a very convenient manner.

tmp_dir <- tempdir() # Create some directories for (i in 1:10) {     dir.create(path = file.path(tmp_dir,                                 basename(tempfile(pattern = "dir")),                                 basename(tempfile(pattern = "sub_dir"))),                recursive = TRUE) } # Create directory tree fs::dir_tree(path = tmp_dir, recurse = TRUE) 

Results

/tmp/RtmpEhB0ne ├── dir15213121dd5903 │   └── sub_dir1521315a5425ba ├── dir152131227b086f │   └── sub_dir1521314255d96b ├── dir152131353e6603 │   └── sub_dir1521315b52aeed ├── dir15213136870535 │   └── sub_dir15213127b34f64 ├── dir1521313bbf738b │   └── sub_dir152131473939ea ├── dir152131403f4fd5 │   └── sub_dir152131115296e7 ├── dir152131503d0d55 │   └── sub_dir15213114368572 ├── dir1521316f0bb0c3 │   └── sub_dir1521314aea266b ├── dir1521317fe305e9 │   └── sub_dir152131bcfe8a └── dir1521319800dfb     └── sub_dir15213129defd4a 

In addition to printing directory tree, discovered paths can be returned to an object.

sink(file = tempfile(fileext = ".log")) res_fs_tree <- fs::dir_tree(path = tmp_dir, recurse = TRUE) sink() res_fs_tree[[1]] # [1] "/tmp/RtmpEhB0ne/dir15213121dd5903/sub_dir1521315a5425ba" 
like image 29
Konrad Avatar answered Oct 08 '22 13:10

Konrad