I have created a simple data.tree
through importing a folder structure with files inside of it.
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/pathr")
library(pathr)
library(data.tree)
folder_structure <- pathr::tree(path = "/Users/username/Downloads/top_level/",
use.data.tree = T, include.files = T)
Now, I would like to convert the object folder_structure
into a data.frame
with one row per folder and a column that specifies how many files each folder contains. How can I accomplish this?
For example, I have this very simply folder structure:
top_level_folder
sub_folder_1
file1.txt
sub_folder_2
file2.txt
Answering the question would involve creating an output that looks like this:
Folders Files
top_level_folder 0
sub_folder_1 1
sub_folder_2 1
The first column can simply be generated through calling list.dirs("/Users/username/Downloads/top_level/")
, but I don't know how to generate the second column. Note that the second column is non-recursive, meaning that files within subfolders are not counted (i.e. top_level_folder
contains 0
files, even though the subfolders of top_level_folder
contains 2 files).
If you want to see whether your solution scales or not, download the Rails codebase: https://github.com/rails/rails/archive/master.zip and try it on Rails' more complex file structure.
list.dirs()
provides a vector of every subdirectory reachable from a starting folder, so that handles the first column of your data-frame. Very convenient.
# Get a vector of all the directories and subdirectories from this folder
dir <- "."
xs <- list.dirs(dir, recursive = TRUE)
list.files()
can tell us the contents of each of those folders, but it includes files and folders. We just want the files. To get the count of files, we need to filter the output of list.files()
with a predicate. file.info()
can tell us whether a given file is a directory or not, so we build our predicate from that.
# Helper to check if something is folder or file
is_dir <- function(x) file.info(x)[["isdir"]]
is_file <- Negate(is_dir)
Now, we solve how to get the number of files in a single folder. Summing boolean values returns the number of TRUE
cases.
# Count the files in a single folder
count_files_in_one_dir <- function(dir) {
files <- list.files(dir, full.names = TRUE)
sum(is_file(files))
}
For convenience, we wrap that function to make it work on many folders.
# Vectorized version of the above
count_files_in_dir <- function(dir) {
vapply(dir, count_files_in_one_dir, numeric(1), USE.NAMES = FALSE)
}
Now we can count the files.
df <- tibble::data_frame(
dir = xs,
nfiles = count_files_in_dir(xs))
df
#> # A tibble: 688 x 2
#> dir nfiles
#> <chr> <dbl>
#> 1 . 11
#> 2 ./.github 3
#> 3 ./actioncable 7
#> 4 ./actioncable/app 0
#> 5 ./actioncable/app/assets 0
#> 6 ./actioncable/app/assets/javascripts 1
#> 7 ./actioncable/app/assets/javascripts/action_cable 5
#> 8 ./actioncable/bin 1
#> 9 ./actioncable/lib 1
#> 10 ./actioncable/lib/action_cable 8
#> # ... with 678 more rows
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With