Has anyone implemented the POSIX 1003.2 compiliant CRC algorithm (as output by cksum
) in awk/gawk? I'm needing to do a checksum on an in memory string (not the whole file) and shelling out to call cksum
is slow and expensive.
My overall need is to generate a numerical checksum that fits within 10 digits or less. Other hash/CRC functions could work too, anyone have any thing handy?
A Google search and a scan of awk.info turned up nothing interesting.
Here is a awk (gawk) implimentation of crc32.
Notice that we use T and X as our lookup table. T
is used for the crc32_table, and X
is used for lookup of data to int value.
If you wish, you can compute the crc32_table on runtime, however it was a bit slow on startup, so you would have a tradeoff between small codesize and slow tartup
, or Reasonable speed crc32 calculation and large code size
. I would recomend version with a crc_table, as the code size increace was well justifyable when compareson of speed was done.
If you have a version of awk that does not support and(),xor(),compl(),lshift(),rshift()
then do not forget to load the bitwise operation libs.
BEGIN{
# Initialize CRC32 table
T[0]=0x00000000;
T[1]=0x04c11db7;T[2]=0x09823b6e;T[3]=0x0d4326d9;T[4]=0x130476dc;T[5]=0x17c56b6b;
T[6]=0x1a864db2;T[7]=0x1e475005;T[8]=0x2608edb8;T[9]=0x22c9f00f;T[10]=0x2f8ad6d6;
T[11]=0x2b4bcb61;T[12]=0x350c9b64;T[13]=0x31cd86d3;T[14]=0x3c8ea00a;T[15]=0x384fbdbd;
T[16]=0x4c11db70;T[17]=0x48d0c6c7;T[18]=0x4593e01e;T[19]=0x4152fda9;T[20]=0x5f15adac;
T[21]=0x5bd4b01b;T[22]=0x569796c2;T[23]=0x52568b75;T[24]=0x6a1936c8;T[25]=0x6ed82b7f;
T[26]=0x639b0da6;T[27]=0x675a1011;T[28]=0x791d4014;T[29]=0x7ddc5da3;T[30]=0x709f7b7a;
T[31]=0x745e66cd;T[32]=0x9823b6e0;T[33]=0x9ce2ab57;T[34]=0x91a18d8e;T[35]=0x95609039;
T[36]=0x8b27c03c;T[37]=0x8fe6dd8b;T[38]=0x82a5fb52;T[39]=0x8664e6e5;T[40]=0xbe2b5b58;
T[41]=0xbaea46ef;T[42]=0xb7a96036;T[43]=0xb3687d81;T[44]=0xad2f2d84;T[45]=0xa9ee3033;
T[46]=0xa4ad16ea;T[47]=0xa06c0b5d;T[48]=0xd4326d90;T[49]=0xd0f37027;T[50]=0xddb056fe;
T[51]=0xd9714b49;T[52]=0xc7361b4c;T[53]=0xc3f706fb;T[54]=0xceb42022;T[55]=0xca753d95;
T[56]=0xf23a8028;T[57]=0xf6fb9d9f;T[58]=0xfbb8bb46;T[59]=0xff79a6f1;T[60]=0xe13ef6f4;
T[61]=0xe5ffeb43;T[62]=0xe8bccd9a;T[63]=0xec7dd02d;T[64]=0x34867077;T[65]=0x30476dc0;
T[66]=0x3d044b19;T[67]=0x39c556ae;T[68]=0x278206ab;T[69]=0x23431b1c;T[70]=0x2e003dc5;
T[71]=0x2ac12072;T[72]=0x128e9dcf;T[73]=0x164f8078;T[74]=0x1b0ca6a1;T[75]=0x1fcdbb16;
T[76]=0x018aeb13;T[77]=0x054bf6a4;T[78]=0x0808d07d;T[79]=0x0cc9cdca;T[80]=0x7897ab07;
T[81]=0x7c56b6b0;T[82]=0x71159069;T[83]=0x75d48dde;T[84]=0x6b93dddb;T[85]=0x6f52c06c;
T[86]=0x6211e6b5;T[87]=0x66d0fb02;T[88]=0x5e9f46bf;T[89]=0x5a5e5b08;T[90]=0x571d7dd1;
T[91]=0x53dc6066;T[92]=0x4d9b3063;T[93]=0x495a2dd4;T[94]=0x44190b0d;T[95]=0x40d816ba;
T[96]=0xaca5c697;T[97]=0xa864db20;T[98]=0xa527fdf9;T[99]=0xa1e6e04e;T[100]=0xbfa1b04b;
T[101]=0xbb60adfc;T[102]=0xb6238b25;T[103]=0xb2e29692;T[104]=0x8aad2b2f;T[105]=0x8e6c3698;
T[106]=0x832f1041;T[107]=0x87ee0df6;T[108]=0x99a95df3;T[109]=0x9d684044;T[110]=0x902b669d;
T[111]=0x94ea7b2a;T[112]=0xe0b41de7;T[113]=0xe4750050;T[114]=0xe9362689;T[115]=0xedf73b3e;
T[116]=0xf3b06b3b;T[117]=0xf771768c;T[118]=0xfa325055;T[119]=0xfef34de2;T[120]=0xc6bcf05f;
T[121]=0xc27dede8;T[122]=0xcf3ecb31;T[123]=0xcbffd686;T[124]=0xd5b88683;T[125]=0xd1799b34;
T[126]=0xdc3abded;T[127]=0xd8fba05a;T[128]=0x690ce0ee;T[129]=0x6dcdfd59;T[130]=0x608edb80;
T[131]=0x644fc637;T[132]=0x7a089632;T[133]=0x7ec98b85;T[134]=0x738aad5c;T[135]=0x774bb0eb;
T[136]=0x4f040d56;T[137]=0x4bc510e1;T[138]=0x46863638;T[139]=0x42472b8f;T[140]=0x5c007b8a;
T[141]=0x58c1663d;T[142]=0x558240e4;T[143]=0x51435d53;T[144]=0x251d3b9e;T[145]=0x21dc2629;
T[146]=0x2c9f00f0;T[147]=0x285e1d47;T[148]=0x36194d42;T[149]=0x32d850f5;T[150]=0x3f9b762c;
T[151]=0x3b5a6b9b;T[152]=0x0315d626;T[153]=0x07d4cb91;T[154]=0x0a97ed48;T[155]=0x0e56f0ff;
T[156]=0x1011a0fa;T[157]=0x14d0bd4d;T[158]=0x19939b94;T[159]=0x1d528623;T[160]=0xf12f560e;
T[161]=0xf5ee4bb9;T[162]=0xf8ad6d60;T[163]=0xfc6c70d7;T[164]=0xe22b20d2;T[165]=0xe6ea3d65;
T[166]=0xeba91bbc;T[167]=0xef68060b;T[168]=0xd727bbb6;T[169]=0xd3e6a601;T[170]=0xdea580d8;
T[171]=0xda649d6f;T[172]=0xc423cd6a;T[173]=0xc0e2d0dd;T[174]=0xcda1f604;T[175]=0xc960ebb3;
T[176]=0xbd3e8d7e;T[177]=0xb9ff90c9;T[178]=0xb4bcb610;T[179]=0xb07daba7;T[180]=0xae3afba2;
T[181]=0xaafbe615;T[182]=0xa7b8c0cc;T[183]=0xa379dd7b;T[184]=0x9b3660c6;T[185]=0x9ff77d71;
T[186]=0x92b45ba8;T[187]=0x9675461f;T[188]=0x8832161a;T[189]=0x8cf30bad;T[190]=0x81b02d74;
T[191]=0x857130c3;T[192]=0x5d8a9099;T[193]=0x594b8d2e;T[194]=0x5408abf7;T[195]=0x50c9b640;
T[196]=0x4e8ee645;T[197]=0x4a4ffbf2;T[198]=0x470cdd2b;T[199]=0x43cdc09c;T[200]=0x7b827d21;
T[201]=0x7f436096;T[202]=0x7200464f;T[203]=0x76c15bf8;T[204]=0x68860bfd;T[205]=0x6c47164a;
T[206]=0x61043093;T[207]=0x65c52d24;T[208]=0x119b4be9;T[209]=0x155a565e;T[210]=0x18197087;
T[211]=0x1cd86d30;T[212]=0x029f3d35;T[213]=0x065e2082;T[214]=0x0b1d065b;T[215]=0x0fdc1bec;
T[216]=0x3793a651;T[217]=0x3352bbe6;T[218]=0x3e119d3f;T[219]=0x3ad08088;T[220]=0x2497d08d;
T[221]=0x2056cd3a;T[222]=0x2d15ebe3;T[223]=0x29d4f654;T[224]=0xc5a92679;T[225]=0xc1683bce;
T[226]=0xcc2b1d17;T[227]=0xc8ea00a0;T[228]=0xd6ad50a5;T[229]=0xd26c4d12;T[230]=0xdf2f6bcb;
T[231]=0xdbee767c;T[232]=0xe3a1cbc1;T[233]=0xe760d676;T[234]=0xea23f0af;T[235]=0xeee2ed18;
T[236]=0xf0a5bd1d;T[237]=0xf464a0aa;T[238]=0xf9278673;T[239]=0xfde69bc4;T[240]=0x89b8fd09;
T[241]=0x8d79e0be;T[242]=0x803ac667;T[243]=0x84fbdbd0;T[244]=0x9abc8bd5;T[245]=0x9e7d9662;
T[246]=0x933eb0bb;T[247]=0x97ffad0c;T[248]=0xafb010b1;T[249]=0xab710d06;T[250]=0xa6322bdf;
T[251]=0xa2f33668;T[252]=0xbcb4666d;T[253]=0xb8757bda;T[254]=0xb5365d03;T[255]=0xb1f740b4;
# Init raw data to int lookup table
for(i=0;i<=255;i++)X[sprintf("%c",i)]=i;
}
Then calculate the crc32
# Limit var size to 32bit
function u32(v){return and(v,0xffffffff)}
{
# Lets try with $0 as buf in this example.
buf = $0;
# Step 1) Start CRC32 calculation.
len = 0; #// Total size.
crc=u32( 0 ); #// Initial seed. POSIX compatible crc32 uses 0
# Step 2) Repeat this for as many buf as nessary, we assume "buf" contains data.
A[0]=split(buf,A,"");
len += A[0]
for(i=1;i<=A[0];i++)crc=u32(xor(u32(lshift(crc,8)),T[u32(and(xor(rshift(crc,24),X[A[i]]),0xFF))]));
# Step 3) End CRC32 calculation. Calculate the total size of buf read, and write into CRC
while(len){crc=u32(xor(u32(lshift(crc,8)),T[u32(and(xor(rshift(crc,24),and(len,0xFF)),0xFF))]));len=rshift(len,8);}
crc=u32(compl(crc));
print "crc=["crc"]";
}
crc=[4294967295] <-- ""
crc=[1220704766] <-- "a"
crc=[1219131554] <-- "abc"
crc=[3644109718] <-- "message digest"
All crc32 values match, exactly like how cksum
command does it :-)
Since cksum
uses a large table, it's probably impractical to re-implement it in AWK. You might be able to calculate it on the fly without using a table, but that's likely to be slower than calling cksum
.
References:
cksum
sourceTranslating it from C to AWK should be fairly trivial, however, if someone were so inclined.
By the way, gawk
has coprocesses:
gawk 'BEGIN {
cmd="cksum"
print "hello" |& cmd
close(cmd, "to")
while (cmd |& getline a > 0)
print a
close(cmd)
}'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With