Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert arbitrary output to json by column in the terminal?

I'd like to be able to pipe the output from any command line program to a command that converts it to json.

For example my unknown program could accept target columns, a delimiter and output field names

# select columns 1 and 3 from the output and convert it to simple json
netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other

and would generate something like so:

[ 
  {"name": "tcp4", "other": "31"},
  {"name": "tcp4", "other": "0"} 
...
]

I'm looking for something that works for any program, not just netstat!

I'm open to installing any 3rd party tool/opensource project, and tend to run things on linux/osx - does not have to be a bash script solution, can be written in node, perl, python, etc.

EDIT: I'm of course willing to pass in any more info that'd be required to make it work, for example a delimiter or multiple delimiters - I'd just like to avoid explicit parsing in the command line, and have the tool do that.

like image 930
Brad Parks Avatar asked Nov 30 '22 22:11

Brad Parks


2 Answers

Filtering STDIN to build json variable

Introduction

As terminal is a very special kind of interface, with monospaced fonts, tools are built to monitor on this terminal, many output could be very difficult to parse:

netstat output is a good sample:

Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ACC ]     STREAM     LISTENING     13947569 @/tmp/.X11-unix/X1
unix  2      [ ]         DGRAM                    8760     /run/systemd/notify
unix  2      [ ACC ]     SEQPACKET  LISTENING     8790     /run/udev/control

Where some line contain blank fields, this could not be simply splitted on spaces.

Because of this, the requestet script convert_to_json will be posted at very bottom of this.

Simple space based splitting with awk

By using awk, you could use nice syntax:

netstat -an |
    awk '/CLOSE_WAIT/{
        printf "  { \42%s\42:\42%s\42,\42%s\42:\42%s\42},\n","name",$1,"other",$3
    }' |
    sed '1s/^/[\n/;$s/,$/\n]/'

Simple space based splitting with perl, but using json library

But this perl way is more flexible:

netstat -an | perl -MJSON::XS -ne 'push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;END{print encode_json(\@out)."\n";}'

or same but splitted;

netstat -an |
    perl -MJSON::XS -ne '
        push @out,{"name"=>,$1,"other"=>$2} if
                /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
        END{print encode_json(\@out)."\n";
}'

Or pretty-printed:

netstat -an | perl -MJSON::XS -ne '
    push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
    END{$coder = JSON::XS->new->ascii->pretty->allow_nonref;
        print $coder->encode(\@out);}'

Finally, I like this version not based on regex:

netstat -an | perl -MJSON::XS -ne '
    do {
        my @line=split(/\s+/);
        push @out,{"name"=>,$line[0],"other"=>$line[2]}
    } if /CLOSE_WAIT/;
    END{
        $coder = JSON::XS->new->ascii->pretty->allow_nonref;
        print $coder->encode(\@out);
    }'

But you could run command inside perl script:

perl -MJSON::XS -e '
    open STDIN,"netstat -an|";
    my @out;
    while (<>){
        push @out,{"name"=>,$1,"other"=>$2} if /^(\S+)\s+\d+\s+(\d+)\s.*CLOSE_WAIT/;
    };
    print encode_json \@out;'

This could become a basical prototyp:

#!/usr/bin/perl -w

use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;

$ENV{'LANG'}='C';
open STDIN,"netstat -naut|";
my @out;
my @fields;

my $searchre=":";
$searchre = shift @ARGV if @ARGV;

while (<>){
    map { s/_/ /g;push @fields,$_; } split(/\s+/) if
        /^Proto.*State/ && s/\sAddr/_Addr/g;
    do {
        my @line=split(/\s+/);
        my %entry;
        for my $i (0..$#fields) {
            $entry{$fields[$i]}=$line[$i];
        };
        push @out,\%entry;
    } if /$searchre/;
}

print $coder->encode(\@out);

(Without argument, this will dump entire netstat -uta, but you could give any search string as argument, like CLOSE or an IP.)

Positional parameters, netstat2json.pl

This method could work with many other tools than netcat, with some corrections:

#!/usr/bin/perl -w
use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;
$ENV{'LANG'}='C';
open STDIN,"netstat -nap|";
my ( $searchre ,@out,%fields)=( "[/:]" );
$searchre = shift @ARGV if @ARGV;
while (<>){
    next if /^Active\s.*\)$/;
    /^Proto.*State/ && do {
        s/\s(name|Addr)/_$1/g;
        my @head;
        map { s/_/ /g;push @head,$_; } split(/\s+/);
        s/_/ /g;
        %fields=();
        for my $i (0..$#head) {
            my $crt=index($_,$head[$i]);
            my $next=-1;
            $next=index($_,$head[$i+1])-$crt-1 if $i < $#head;
            $fields{$head[$i]}=[$crt,$next];
        }
        next;
    };
    do {
        my $line=$_;
        my %entry;
        for my $i (keys %fields) {
            my $crt=substr($line,$fields{$i}[0],$fields{$i}[1]);
            $crt=~s/^\s*(\S(|.*\S))\s*$/$1/;
            $entry{$i}=$crt;
        };
        push @out,\%entry;
    } if /$searchre/;
}
print $coder->encode(\@out);
  • find header lines Proto.*State (specific to netcat)
  • store fieldnames with position and length
  • split line by field length,then trim spaces
  • dump variable as json string.

This could be run with arguments, like previously:

./netstat2json.pl CLOS
[
   {
      "Local Address" : "127.0.0.1:31001",
      "State" : "CLOSE_WAIT",
      "Recv-Q" : "18",
      "Proto" : "tcp",
      "Send-Q" : "0",
      "Foreign Address" : "127.0.0.1:55938",
      "PID/Program name" : "-"
   },
   {
      "Recv-Q" : "1",
      "Local Address" : "::1:53816",
      "State" : "CLOSE_WAIT",
      "Send-Q" : "0",
      "PID/Program name" : "-",
      "Foreign Address" : "::1:631",
      "Proto" : "tcp6"
   }
]

And empty fields don't break variable assignement:

./netstat2json.pl 1000.*systemd/notify
[
   {
      "Proto" : "unix",
      "I-Node" : "33378",
      "RefCnt" : "2",
      "Path" : "/run/user/1000/systemd/notify",
      "PID/Program name" : "-",
      "Type" : "DGRAM",
      "Flags" : "[ ]",
      "State" : ""
   }
]

Nota! This modified version run netstat with -nap arguments to get PID/Program name field.

If not run by superuser root, you could become this output on STDERR:

(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)

You could avoid them

  • by running netstat2json.pl 2>/dev/null,
  • by running this as root or with sudo or
  • edit line #6, change "netstat -nap|" for "netstat -na|".

convert_to_json.pl perl script to transform STDIN to json.

There is the convert_to_json.pl perl script, strictly as requested: to be run as netstat -an | grep CLOSE | ./convert_to_json.pl 1,3 name,other

#!/usr/bin/perl -w

use strict;
use JSON::XS;
my $coder = JSON::XS->new->ascii->pretty->allow_nonref;

my (@fields,@pos,@out);

map {
    push @pos,1*$_-1
} split ",",shift @ARGV;      

map { 
    push @fields,$_
} split ",",shift @ARGV;

die "Number of fields don't match number of positions" if $#fields ne $#pos;

while (<>) {
    my @line=split(/\s+/);
    my %entry;
    for my $i (0..$#fields) {
         $entry{$fields[$i]}=$line[$pos[$i]];
    };
    push @out,\%entry;
}
print $coder->encode(\@out);
like image 160
F. Hauri Avatar answered Dec 02 '22 12:12

F. Hauri


Here's my ruby version :

#! /usr/bin/env ruby
#
# Converts stdin columns to a JSON array of hashes
#
# Installation : Save as convert_to_json, make it executable and put it somewhere in PATH. Ruby must be installed
#
# Examples :
#
# netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
# ls -l | convert_to_json
# ls -l | convert_to_json 6,7,8,9
# ls -l | convert_to_json 6,7,8,9 month,day,time,name
# convert_to_json 1,2 time,value ";" < some_file.csv
#
#
# http://stackoverflow.com/questions/40246134/convert-arbitrary-output-to-json-by-column-in-the-terminal

require 'json'

script_name = File.basename(__FILE__)
syntax = "Syntax : command_which_outputs_columns | #{script_name} column1_id,column2_id,...,columnN_id column1_name,column2_name,...,columnN_name delimiter"


if $stdin.tty? or $stdin.closed? then
  $stderr.puts syntax
else
  if ARGV[2]
    delimiter = ARGV[2]
    $stderr.puts "#{script_name} : Using #{delimiter} as delimiter"
  else
    delimiter = /\s+/
  end

  column_ids = (ARGV[0] || "").split(',').map{|column_id| column_id.to_i-1}
  column_names = (ARGV[1] || "").split(',')

  results = []
  $stdin.each do |stdin_line|
    if column_ids.empty?
      values = stdin_line.strip.split(delimiter)
    else
      values = stdin_line.strip.split(delimiter).values_at(*column_ids)
    end
    line_hash=Hash.new
    values.each_with_index.each{|value,i|
      colum_name = column_names[i] || "column#{(column_ids[i] || i)+1}"
      line_hash[colum_name]=value
    }
    results<<line_hash
  end
  puts JSON.pretty_generate(results)
end

It works as defined in your example :

netstat -a | grep CLOSE_WAIT | convert_to_json 1,3 name,other
[
  {
    "name": "tcp",
    "other": "0"
  },
  {
    "name": "tcp6",
    "other": "0"
  }
]

As a bonus, you can

  • omit to specify parameters : every column will be converted to json
  • omit to specify names : column will be called column1, column2, ...
  • choose a missing column : value will be null
  • define a delimiter as third parameter. Default is whitespace

Other examples :

netstat -a | grep CLOSE_WAIT | ./convert_to_json
# [
#   {
#     "column1": "tcp",
#     "column2": "1",
#     "column3": "0",
#     "column4": "10.0.2.15:51074",
#     "column5": "123.45.101.207:https",
#     "column6": "CLOSE_WAIT"
#   },
#   {
#     "column1": "tcp6",
#     "column2": "1",
#     "column3": "0",
#     "column4": "ip6-localhost:50293",
#     "column5": "ip6-localhost:ipp",
#     "column6": "CLOSE_WAIT"
#   }
# ]

netstat -a | grep CLOSE_WAIT | ./convert_to_json 1,3
# [
#   {
#     "column1": "tcp",
#     "column3": "0"
#   },
#   {
#     "column1": "tcp6",
#     "column3": "0"
#   }
# ]

ls -l | tail -n3 | convert_to_json 6,7,8,9 month,day,time,name
# [
#   {
#     "month": "Oct",
#     "day": "27",
#     "time": "10:35",
#     "name": "test.dot"
#   },
#   {
#     "month": "Nov",
#     "day": "2",
#     "time": "14:27",
#     "name": "uniq.rb"
#   },
#   {
#     "month": "Nov",
#     "day": "2",
#     "time": "14:27",
#     "name": "utf8_nokogiri.rb"
#   }
# ]

# NOTE: ls -l uses the 8th column for year, not time, for older files :
ls --full-time -t /usr/share/doc | tail -n3 | ./convert_to_json 6,7,9 yyyymmdd,time,name
[
  {
    "yyyymmdd": "2013-10-21",
    "time": "15:15:20.000000000",
    "name": "libbz2-dev"
  },
  {
    "yyyymmdd": "2013-10-10",
    "time": "16:27:32.000000000",
    "name": "zsh"
  },
  {
    "yyyymmdd": "2013-10-03",
    "time": "18:52:45.000000000",
    "name": "manpages-dev"
  }
]

ls -l | tail -n3 | convert_to_json 9,12
# [
#   {
#     "column9": "test.dot",
#     "column12": null
#   },
#   {
#     "column9": "uniq.rb",
#     "column12": null
#   },
#   {
#     "column9": "utf8_nokogiri.rb",
#     "column12": null
#   }
# ]

convert_to_json 1,2 time,value ";" < some_file.csv
# convert_to_json : Using ; as delimiter
# [
#   {
#     "time": "1",
#     "value": "3"
#   },
#   {
#     "time": "2",
#     "value": "5"
#   }
# ]
like image 32
Eric Duminil Avatar answered Dec 02 '22 13:12

Eric Duminil