Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I sum together file sizes in bash, grouping together the results by date?

Tags:

file

bash

On a Linux server that I work with, a process writes randomly-named files at random intervals. Here's a small sample, showing the file size, modification date & time, and file name:

27659   2009-03-09  17:24  APP14452.log
0       2009-03-09  17:24  vim14436.log
20      2009-03-09  17:24  jgU14406.log
15078   2009-03-10  08:06  ySh14450.log
20      2009-03-10  08:06  VhJ14404.log
9044    2009-03-10  15:14  EqQ14296.log
8877    2009-03-10  19:38  Ugp14294.log
8898    2009-03-11  18:21  yzJ14292.log
55629   2009-03-11  18:30  ZjX14448.log
20      2009-03-11  18:31  GwI14402.log
25955   2009-03-12  19:19  lRx14290.log
14989   2009-03-12  19:25  oFw14446.log
20      2009-03-12  19:28  clg14400.log

(Note that sometimes the file size can be zero.)

What I would like is a bash script to sum the size of the files, broken down by date, producing output something like this (assuming my arithmetic is correct):

27679 2009-03-09
33019 2009-03-10
64527 2009-03-11
40964 2009-03-12

The results would show activity trends over time, and highlight the exceptionally busy days.

In SQL, the operation would be a cinch:

SELECT SUM(filesize), filedate
FROM files
GROUP BY filedate;

Now, this is all probably pretty easy in Perl or Python, but I'd really prefer a bash shell or awk solution. It seems especially tricky to me to group the files by date in bash (especially if you can't assume a particular date format). Summing the sizes could be done in a loop I suppose, but is there an easier, more elegant, approach?

like image 424
yukondude Avatar asked Mar 13 '09 16:03

yukondude


2 Answers

I often use this idiom of Awk:

awk '{sum[$2]+= $1;}END{for (date in sum){print sum[date], date;}}'
like image 133
ashawley Avatar answered Sep 28 '22 02:09

ashawley


Only files, recursively, sorted by date and summed

find ./ -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[$1]+= $2;}END{for (date in sum){print date, sum[date];}}'|sort

Only files, from current directory only, sorted by date and summed

find ./ -maxdepth 1 -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[$1]+= $2;}END{for (date in sum){print date, sum[date];}}'|sort
like image 40
Kristjan Adojaan Avatar answered Sep 28 '22 00:09

Kristjan Adojaan