I'm doing a disk space report that uses <code>File::Find</code> to collect cumulative sizing in a directory tree. What I get (easily) from <code>File::Find</code> is the directory name. e.g.: <pre class="prettyprint"><code>/path/to/user/username/subdir/anothersubdir/etc </code></pre> I'm running <code>File::Find</code> to collect sizes beneath: <pre class="prettyprint"><code>/path/to/user/username </code></pre> And build a cumulative size report of the directory and each of the subdirectories. What I've currently got is: <pre class="prettyprint"><code>while ( $dir_tree ) { %results{$dir_tree} += $blocks * $block_size; my @path_arr = split ( "/", $dir_tree ); pop ( @path_arr ); $dir_tree = join ( "/", @path_arr ); } </code></pre> (And yes, I know that's not very nice.). The purpose of doing this is so when I <code>stat</code> each file, I add it's size to the current node and each parent node in the tree. This is sufficient to generate: <pre class="prettyprint"><code>username,300M username/documents,150M username/documents/excel,50M username/documents/word,40M username/work,70M username/fish,50M, username/some_other_stuff,30M </code></pre> But I'd like to now turn that in to JSON more like this: <pre class="prettyprint"><code>{ "name" : "username", "size" : "307200", "children" : [ { "name" : "documents", "size" : "153750", "children" : [ { "name" : "excel", "size" : "51200" }, { "name" : "word", "size" : "81920" } ] } ] } </code></pre> That's because I'm intending to do a D3 visualisation of this structure - loosely based on D3 Zoomable Circle Pack So my question is this - what is the neatest way to collate my data such that I can have cumulative (and ideally non cumulative) sizing information, but populating a hash hierarchically. I was thinking in terms of a 'cursor' approach (and using <code>File::Spec</code> this time): <pre class="prettyprint"><code>use File::Spec; my $data; my $cursor = \$data; foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) { $cursor -> {size} += $blocks * $block_size; $cursor = $cursor -> {$element} } </code></pre> Although... that's not quite creating the data structure I'm looking for, not least because we basically have to search by hash key to do the 'rolling up' part of the process. Is there a better way of accomplishing this? Edit - more complete example of what I have already: <pre class="prettyprint"><code>#!/usr/bin/env perl use strict; use warnings; use File::Find; use Data::Dumper; my $block_size = 1024; sub collate_sizes { my ( $results_ref, $starting_path ) = @_; $starting_path =~ s,/\w+$,/,; if ( -f $File::Find::name ) { print "$File::Find::name isafile\n"; my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks ) = stat($File::Find::name); my $dir_tree = $File::Find::dir; $dir_tree =~ s|^$starting_path||g; while ($dir_tree) { print "Updating $dir_tree\n"; $$results_ref{$dir_tree} += $blocks * $block_size; my @path_arr = split( "/", $dir_tree ); pop(@path_arr); $dir_tree = join( "/", @path_arr ); } } } my @users = qw ( user1 user2 ); foreach my $user (@users) { my $path = "/home/$user"; print $path; my %results; File::Find::find( { wanted => sub { \&collate_sizes( \%results, $path ) }, no_chdir => 1 }, $path ); print Dumper \%results; #would print this to a file in the homedir - to STDOUT for convenience foreach my $key ( sort { $results{$b} <=> $results{$a} } keys %results ) { print "$key => $results{$key}\n"; } } </code></pre> And yes - I know this isn't portable, and does a few somewhat nasty things. Part of what I'm doing here is trying to improve on that. (But currently it's a Unix based homedir structure, so that's fine).

If you do your own dir scanning instead of using File::Find, you naturally get the right structure. <pre class="prettyprint"><code>sub _scan { my ($qfn, $fn) = @_; my $node = { name => $fn }; lstat($qfn) or die $!; my $size = -s _; my $is_dir = -d _; if ($is_dir) { my @child_fns = do { opendir(my $dh, $qfn) or die $!; grep !/^\.\.?\z/, readdir($dh); }; my @children; for my $child_fn (@child_fns) { my $child_node = _scan("$qfn/$child_fn", $child_fn); $size += $child_node->{size}; push @children, $child_node; } $node->{children} = \@children; } $node->{size} = $size; return $node; } </code></pre> Rest of the code: <pre class="prettyprint"><code>#!/usr/bin/perl use strict; use warnings; no warnings 'recursion'; use File::Basename qw( basename ); use JSON qw( encode_json ); ... sub scan { _scan($_[0], basename($_[0])) } print(encode_json(scan($ARGV[0] // '.'))); </code></pre>

File path into JSON data structure

Tags:

json

perl

I'm doing a disk space report that uses File::Find to collect cumulative sizing in a directory tree.

What I get (easily) from File::Find is the directory name.

e.g.:

/path/to/user/username/subdir/anothersubdir/etc

I'm running File::Find to collect sizes beneath:

/path/to/user/username

And build a cumulative size report of the directory and each of the subdirectories.

What I've currently got is:

while ( $dir_tree ) {
   %results{$dir_tree} += $blocks * $block_size;
   my @path_arr = split ( "/", $dir_tree ); 
   pop ( @path_arr );
   $dir_tree = join ( "/", @path_arr ); 
}

(And yes, I know that's not very nice.).

The purpose of doing this is so when I stat each file, I add it's size to the current node and each parent node in the tree.

This is sufficient to generate:

username,300M
username/documents,150M
username/documents/excel,50M
username/documents/word,40M
username/work,70M
username/fish,50M,
username/some_other_stuff,30M

But I'd like to now turn that in to JSON more like this:

{ 
    "name" : "username",
    "size" : "307200",
    "children" : [
        { 
            "name" : "documents",
            "size" : "153750",
            "children" : [
                  { 
                      "name" : "excel",
                      "size" : "51200"
                   }, 
                   {
                       "name" : "word",
                       "size" : "81920"
                   }
             ]
         }
    ]
}

That's because I'm intending to do a D3 visualisation of this structure - loosely based on D3 Zoomable Circle Pack

So my question is this - what is the neatest way to collate my data such that I can have cumulative (and ideally non cumulative) sizing information, but populating a hash hierarchically.

I was thinking in terms of a 'cursor' approach (and using File::Spec this time):

use File::Spec; 
my $data;
my $cursor = \$data; 
foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) {
   $cursor -> {size} += $blocks * $block_size;
   $cursor = $cursor -> {$element} 
}

Although... that's not quite creating the data structure I'm looking for, not least because we basically have to search by hash key to do the 'rolling up' part of the process.

Is there a better way of accomplishing this?

Edit - more complete example of what I have already:

#!/usr/bin/env perl

use strict;
use warnings;

use File::Find;
use Data::Dumper;

my $block_size = 1024;

sub collate_sizes {
    my ( $results_ref, $starting_path ) = @_;
    $starting_path =~ s,/\w+$,/,;
    if ( -f $File::Find::name ) {
        print "$File::Find::name isafile\n";
        my ($dev,   $ino,     $mode, $nlink, $uid,
            $gid,   $rdev,    $size, $atime, $mtime,
            $ctime, $blksize, $blocks
        ) = stat($File::Find::name);

        my $dir_tree = $File::Find::dir;
        $dir_tree =~ s|^$starting_path||g;
        while ($dir_tree) {
            print "Updating $dir_tree\n";
            $$results_ref{$dir_tree} += $blocks * $block_size;
            my @path_arr = split( "/", $dir_tree );
            pop(@path_arr);
            $dir_tree = join( "/", @path_arr );
        }
    }
}

my @users = qw ( user1 user2 );

foreach my $user (@users) {
    my $path = "/home/$user";
    print $path;
    my %results;
    File::Find::find(
        {   wanted   => sub { \&collate_sizes( \%results, $path ) },
            no_chdir => 1
        },
        $path
    );
    print Dumper \%results;

    #would print this to a file in the homedir - to STDOUT for convenience
    foreach my $key ( sort { $results{$b} <=> $results{$a} } keys %results ) {
       print "$key => $results{$key}\n";
    }
}

And yes - I know this isn't portable, and does a few somewhat nasty things. Part of what I'm doing here is trying to improve on that. (But currently it's a Unix based homedir structure, so that's fine).

547

asked Sep 03 '15 11:09

Sobrique

1 Answers

If you do your own dir scanning instead of using File::Find, you naturally get the right structure.

sub _scan {
   my ($qfn, $fn) = @_;
   my $node = { name => $fn };

   lstat($qfn)
      or die $!;

   my $size   = -s _;
   my $is_dir = -d _;

   if ($is_dir) {
      my @child_fns = do {
         opendir(my $dh, $qfn)
            or die $!;

         grep !/^\.\.?\z/, readdir($dh);
      };

      my @children;
      for my $child_fn (@child_fns) {
         my $child_node = _scan("$qfn/$child_fn", $child_fn);
         $size += $child_node->{size};
         push @children, $child_node;
      }

      $node->{children} = \@children;
   }

   $node->{size} = $size;
   return $node;
}

Rest of the code:

#!/usr/bin/perl

use strict;
use warnings;    
no warnings 'recursion';

use File::Basename qw( basename );
use JSON           qw( encode_json );

...    

sub scan { _scan($_[0], basename($_[0])) }

print(encode_json(scan($ARGV[0] // '.')));

answered Oct 07 '22 13:10

ikegami

Related questions
                            
                                Parsing JS array with JS and then passing it to PHP
                            
                                What this the best way to ignore unwanted fields in a JSON payload from a PUT/PATCH using Golang?
                            
                                Typescript mapping Json to interface parsing date
                            
                                PHP > json_encode , force the creation of an array even if it has only one element
                            
                                How setup spray-json to set null when json element is not present?
                            
                                tools to convert jsonschema into Django REST serializisers?
                            
                                How to get all versions of Confluence page via REST API
                            
                                setState not triggering a re-render when data has been modified
                            
                                How to customize JSON encoding output in Go?
                            
                                Gracefully ignore unknown class in Jackson JSON deserialization with type info?
                            
                                Jqgrid search for dates is not working
                            
                                how to keep server side Java and client side JS DTO properties consistent
                            
                                Parsing Json into a dynamic c# object with a dynamic key
                            
                                how to dump http request body in resteasy & wildfly 8.2
                            
                                Custom Jackson JSON serialization based on parent context
                            
                                AngularJS $http.get async execution order
                            
                                Parallel coordinates multidimensional data not visualised in D3
                            
                                How can I disabled select option(s) base on the JSON data got from AJAX?
                            
                                Handling NaN when using fromJSON in R
                            
                                how to use particles.js?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With