We are getting a varying length input file as mentioned below. The text is varying length.
Input file:
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
The text here has named value pair as the content and it's of varying length. Please note that the name in the text column can contain a semi colon. We are trying to parse the input but we are not able handle it via AWK or BASH
Desired Output:
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
The below snipped of code works for ID=2, but doesn't for ID=1
echo "2|name1=value1;name2=value2;name6=;name7=value7;name8=value8" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;done
cat tmp
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
echo "1|name1=value1;name3;name4=value2;name5=value5" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;sed -i "s/^/${id}\|/g" tmp;done
cat tmp
1|name1=value1
1|name3
1|name4=value2
1|name5=value5
Any help is greatly appreciated.
Could you please try following, written and tested with shown samples in GNU awk with new version of it. Since OP's awk version is old so if anyone having old version of awk then try changing it to awk --re-interval
awk '
BEGIN{
FS=OFS="|"
}
FNR==1{ next }
{
first=$1
while(match($0,/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
print first,substr($0,RSTART,RLENGTH)
$0=substr($0,RSTART+RLENGTH)
}
}' Input_file
Output will be as follows.
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
Explanation: Adding detailed explanation for above(following is for explanation purposes only).
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="|" ##Setting FS and OFS wiht | here.
}
FNR==1{ next } ##If line is first line then go next, do not print anything.
{
first=$1 ##Creating first and setting as first field here.
while(match($0,/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
##Running while loop which has match which has a regex of matching name and value all mentioned permutations and combinations.
print first,substr($0,RSTART,RLENGTH) ##Printing first and sub string(currently matched one)
$0=substr($0,RSTART+RLENGTH) ##Saving rest of the line into current line.
}
}' Input_file ##Mentioning Input_file name here.
Sample data:
$ cat name.dat
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8
One awk solution:
awk -F"[|;]" ' # use "|" and ";" as input field delimiters
FNR==1 { next } # skip header line
{ pfx=$1 "|" # set output prefix to field 1 + "|"
printpfx=1 # set flag to print prefix
for ( i=2 ; i<=NF ; i++ ) # for fields 2 to NF
{
if ( printpfx) { printf "%s", pfx ; printpfx=0 } # if print flag == 1 then print prefix and clear flag
if ( $(i) ~ /=/ ) { printf "%s\n", $(i) ; printpfx=1 } # if current field contains "=" then print it, end this line of output, reset print flag == 1
if ( $(i) !~ /=/ ) { printf "%s;", $(i) } # if current field does not contain "=" then print it and include a ";" suffix
}
}
' name.dat
The above generates:
1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With