Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hash using sha1sum using awk

Tags:

sed

hash

awk

sha1

I have a "pipe-separated" file that has about 20 columns. I want to just hash the first column which is a number like account number using sha1sum and return the rest of the columns as is.

Whats the best way I can do this using awk or sed?

Accountid|Time|Category|.....
8238438|20140101021301|sub1|...
3432323|20140101041903|sub2|...
9342342|20140101050303|sub1|...

Above is an example of the text file showing just 3 columns. Only the first column has the hashfunction implemented on it. Result should like:

Accountid|Time|Category|.....
104a1f34b26ae47a67273fe06456be1fe97f75ba|20140101021301|sub1|...
c84270c403adcd8aba9484807a9f1c2164d7f57b|20140101041903|sub2|...
4fa518d8b005e4f9a085d48a4b5f2c558c8402eb|20140101050303|sub1|...
like image 796
user1189851 Avatar asked Sep 01 '25 06:09

user1189851


1 Answers

What the Best Way™ is is up for debate. One way to do it with awk is

awk -F'|' 'BEGIN { OFS=FS } NR == 1 { print } NR != 1 { gsub(/'\''/, "'\'\\\\\'\''", $1); command = ("echo '\''" $1 "'\'' | sha1sum -b | cut -d\\  -f 1"); command | getline hash; close(command); $1 = hash; print }' filename

That is

BEGIN {
  OFS = FS          # set output field separator to field separator; we will use
                    # it because we meddle with the fields.
}
NR == 1 {           # first line: just print headers.
  print
}
NR != 1 {           # from there on do the hash/replace
  # this constructs a shell command (and runs it) that echoes the field
  # (singly-quoted to prevent surprises) through sha1sum -b, cuts out the hash
  # and gets it back into awk with getline (into the variable hash)
  # the gsub bit is to prevent the shell from barfing if there's an apostrophe
  # in one of the fields.
  gsub(/'/, "'\\''", $1);
  command = ("echo '" $1 "' | sha1sum -b | cut -d\\  -f 1")
  command | getline hash
  close(command)

  # then replace the field and print the result.
  $1 = hash
  print
}

You will notice the differences between the shell command at the top and the awk code at the bottom; that is all due to shell expansion. Because I put the awk code in single quotes in the shell commands (double quotes are not up for debate in that context, what with $1 and all), and because the code contains single quotes, making it work inline leads to a nightmare of backslashes. Because of this, my advice is to put the awk code into a file, say foo.awk, and run

awk -F'|' -f foo.awk filename

instead.

like image 132
Wintermute Avatar answered Sep 03 '25 02:09

Wintermute