Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting commas in a line in bash

Tags:

bash

shell

Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.

In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?

like image 657
Stuart Woodward Avatar asked May 30 '12 13:05

Stuart Woodward


4 Answers

Strip everything but the commas, and then count number of characters left:

$ echo foo,bar,baz | tr -cd , | wc -c
2
like image 124
lanzz Avatar answered Sep 30 '22 10:09

lanzz


To count the number of times a comma appears, you can use something like awk:

string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'

But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.

like image 39
Jon Lin Avatar answered Sep 30 '22 11:09

Jon Lin


In pure Bash:

while IFS=, read -ra array
do
    echo "$((${#array[@]} - 1))"
done < inputfile

or

while read -r line
do
    count=${line//[^,]}
    echo "${#count}"
done < inputfile
like image 45
Dennis Williamson Avatar answered Sep 30 '22 11:09

Dennis Williamson


What worked for me better than the other solutions was this. If test.txt has:

foo,bar,baz
baz,foo,foobar,bar

Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c' produces

2
3

This works very well for streaming sources, or tailing logs, etc.

like image 41
Marcus Avatar answered Sep 30 '22 10:09

Marcus