Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

command line utility to print statistics of numbers in linux

I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth.

Is there a command line utility in linux to do the same? I usually need to find the average, median, min, max and std deviation.

like image 527
MK. Avatar asked Mar 20 '12 15:03

MK.


People also ask

Which command is used to print descriptive statistics for all columns in a csv file?

You could use apply(x, MARGIN = 2, FUN = sd) to get the SDs for all columns.

Is there a print command in Linux?

Printing from the Linux command line is easy. You use the lp command to request a print, and lpq to see what print jobs are in the queue, but things get a little more complicated when you want to print double-sided or use portrait mode.

What is command line utility in Linux?

Glances. glances is a command-line system monitoring utility, that allows Linux users to monitor CPU, load average, memory, network interfaces, disk I/O, processes, and file system spaces utilization.

What is printing commands in Linux?

In Linux, different commands are used to print a file or output. Printing from a Linux terminal is a straightforward process. The lp and lpr commands are used to print from the terminal. And, the lpg command is used to display queued print jobs.


2 Answers

This is a breeze with R. For a file that looks like this:

1 2 3 4 5 6 7 8 9 10 

Use this:

R -q -e "x <- read.csv('nums.txt', header = F); summary(x); sd(x[ , 1])" 

To get this:

       V1         Min.   : 1.00    1st Qu.: 3.25    Median : 5.50    Mean   : 5.50    3rd Qu.: 7.75    Max.   :10.00   [1] 3.02765 
  • The -q flag squelches R's startup licensing and help output
  • The -e flag tells R you'll be passing an expression from the terminal
  • x is a data.frame - a table, basically. It's a structure that accommodates multiple vectors/columns of data, which is a little peculiar if you're just reading in a single vector. This has an impact on which functions you can use.
  • Some functions, like summary(), naturally accommodate data.frames. If x had multiple fields, summary() would provide the above descriptive stats for each.
  • But sd() can only take one vector at a time, which is why I index x for that command (x[ , 1] returns the first column of x). You could use apply(x, MARGIN = 2, FUN = sd) to get the SDs for all columns.
like image 117
Matt Parker Avatar answered Oct 26 '22 02:10

Matt Parker


Using "st" (https://github.com/nferraz/st)

$ st numbers.txt N    min   max   sum   mean  stddev 10   1     10    55    5.5   3.02765 

Or:

$ st numbers.txt --transpose N      10 min    1 max    10 sum    55 mean   5.5 stddev 3.02765 

(DISCLAIMER: I wrote this tool :))

like image 33
user2747481 Avatar answered Oct 26 '22 03:10

user2747481