I'm doing a disk space report that uses File::Find
to collect cumulative sizing in a directory tree.
What I get (easily) from File::Find
is the directory name.
e.g.:
/path/to/user/username/subdir/anothersubdir/etc
I'm running File::Find
to collect sizes beneath:
/path/to/user/username
And build a cumulative size report of the directory and each of the subdirectories.
What I've currently got is:
while ( $dir_tree ) {
%results{$dir_tree} += $blocks * $block_size;
my @path_arr = split ( "/", $dir_tree );
pop ( @path_arr );
$dir_tree = join ( "/", @path_arr );
}
(And yes, I know that's not very nice.).
The purpose of doing this is so when I stat
each file, I add it's size to the current node and each parent node in the tree.
This is sufficient to generate:
username,300M
username/documents,150M
username/documents/excel,50M
username/documents/word,40M
username/work,70M
username/fish,50M,
username/some_other_stuff,30M
But I'd like to now turn that in to JSON more like this:
{
"name" : "username",
"size" : "307200",
"children" : [
{
"name" : "documents",
"size" : "153750",
"children" : [
{
"name" : "excel",
"size" : "51200"
},
{
"name" : "word",
"size" : "81920"
}
]
}
]
}
That's because I'm intending to do a D3 visualisation of this structure - loosely based on D3 Zoomable Circle Pack
So my question is this - what is the neatest way to collate my data such that I can have cumulative (and ideally non cumulative) sizing information, but populating a hash hierarchically.
I was thinking in terms of a 'cursor' approach (and using File::Spec
this time):
use File::Spec;
my $data;
my $cursor = \$data;
foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) {
$cursor -> {size} += $blocks * $block_size;
$cursor = $cursor -> {$element}
}
Although... that's not quite creating the data structure I'm looking for, not least because we basically have to search by hash key to do the 'rolling up' part of the process.
Is there a better way of accomplishing this?
Edit - more complete example of what I have already:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
use Data::Dumper;
my $block_size = 1024;
sub collate_sizes {
my ( $results_ref, $starting_path ) = @_;
$starting_path =~ s,/\w+$,/,;
if ( -f $File::Find::name ) {
print "$File::Find::name isafile\n";
my ($dev, $ino, $mode, $nlink, $uid,
$gid, $rdev, $size, $atime, $mtime,
$ctime, $blksize, $blocks
) = stat($File::Find::name);
my $dir_tree = $File::Find::dir;
$dir_tree =~ s|^$starting_path||g;
while ($dir_tree) {
print "Updating $dir_tree\n";
$$results_ref{$dir_tree} += $blocks * $block_size;
my @path_arr = split( "/", $dir_tree );
pop(@path_arr);
$dir_tree = join( "/", @path_arr );
}
}
}
my @users = qw ( user1 user2 );
foreach my $user (@users) {
my $path = "/home/$user";
print $path;
my %results;
File::Find::find(
{ wanted => sub { \&collate_sizes( \%results, $path ) },
no_chdir => 1
},
$path
);
print Dumper \%results;
#would print this to a file in the homedir - to STDOUT for convenience
foreach my $key ( sort { $results{$b} <=> $results{$a} } keys %results ) {
print "$key => $results{$key}\n";
}
}
And yes - I know this isn't portable, and does a few somewhat nasty things. Part of what I'm doing here is trying to improve on that. (But currently it's a Unix based homedir structure, so that's fine).
Rules for JSON SyntaxData should be in name/value pairs. Data should be separated by commas. Curly braces should hold objects. Square brackets hold arrays.
push(newData); To write this new data to our JSON file, we will use fs. writeFile() which takes the JSON file and data to be added as parameters. Note that we will have to first convert the object back into raw format before writing it.
JSON defines only two data structures: objects and arrays. An object is a set of name-value pairs, and an array is a list of values. JSON defines seven value types: string, number, object, array, true, false, and null.
You convert the whole array to JSON as one object by calling JSON. stringify() on the array, which results in a single JSON string. To convert back to an array from JSON, you'd call JSON. parse() on the string, leaving you with the original array.
If you do your own dir scanning instead of using File::Find, you naturally get the right structure.
sub _scan {
my ($qfn, $fn) = @_;
my $node = { name => $fn };
lstat($qfn)
or die $!;
my $size = -s _;
my $is_dir = -d _;
if ($is_dir) {
my @child_fns = do {
opendir(my $dh, $qfn)
or die $!;
grep !/^\.\.?\z/, readdir($dh);
};
my @children;
for my $child_fn (@child_fns) {
my $child_node = _scan("$qfn/$child_fn", $child_fn);
$size += $child_node->{size};
push @children, $child_node;
}
$node->{children} = \@children;
}
$node->{size} = $size;
return $node;
}
Rest of the code:
#!/usr/bin/perl
use strict;
use warnings;
no warnings 'recursion';
use File::Basename qw( basename );
use JSON qw( encode_json );
...
sub scan { _scan($_[0], basename($_[0])) }
print(encode_json(scan($ARGV[0] // '.')));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With