Sometimes I receive a CSV file which has a carriage return inside a cell. This is not an acceptable format to a program that will use it as input.
In order to detect if an input line is split, I determined that a bad line would not have the expected number of commas in it. Is there a bash or other common unix command line tool that would allow me to count the commas in the line? If necessary, I can write a Python or Perl program to do it, but if possible, I'd like to add a line or two to an existing bash script to cause it to fail if the comma count is wrong. Any ideas?
Strip everything but the commas, and then count number of characters left:
$ echo foo,bar,baz | tr -cd , | wc -c
2
To count the number of times a comma appears, you can use something like awk:
string=(line of input from CSV file)
echo "$string" | awk -F "," '{print NF-1}'
But this really isn't sufficient to determine whether a field has carriage returns in it. Fields can have commas inside as long as they're surrounded by quotes.
In pure Bash:
while IFS=, read -ra array
do
echo "$((${#array[@]} - 1))"
done < inputfile
or
while read -r line
do
count=${line//[^,]}
echo "${#count}"
done < inputfile
What worked for me better than the other solutions was this. If test.txt
has:
foo,bar,baz
baz,foo,foobar,bar
Then cat test.txt | xargs -I % sh -c 'echo % | tr -cd , | wc -c'
produces
2
3
This works very well for streaming sources, or tailing logs, etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With