I am trying to clean up some data, and I would eventually like to put it in CSV form.
I have used some regular expressions to clean it up, but I'm stuck on one step.
I would like to replace all but every third newline (\n) with a comma.
The data looks like this:
field1
field2
field3
field1
field2
field3
etc..
I need it in
field1,field2,field3
field1,field2,field3
Anyone have a simple way to do this using sed or awk? I could write a program and use a loop with a mod counter to erase every 1st and 2nd newline char, but I'd rather do it from the command line if possible.
With awk:
awk '{n2=n1;n1=n;n=$0;if(NR%3==0){printf"%s,%s,%s\n",n2,n1,n}}' yourData.txt
This script saves the last three lines and print them at every third line. Unfortunately, this works only with files having a multiple of 3 lines.
A more general script is:
awk '{l=l$0;if(NR%3==0){print l;l=""}else{l=l","}}END{if(l!=""){print substr(l,1,length(l)-1)}}' yourData.txt
In this case, the last three lines are concatenated in a single string, with the comma separator inserted whenever the line number is not a multiple of 3. At the end of the file, the string is printed if it is not empty with the trailing comma removed.
Awk version:
awk '{if (NR%3==0){print $0;}else{printf "%s,", $0;}}'
A Perl solution that's a little shorter and that handles files that don't have a multiple of 3 lines:
perl -pe 's/\n/,/ if(++$i%3&&! eof)' yourData.txt
cat file | perl -ne 'chomp(); print $_, !(++$i%3) ? "\n" : ",";'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With