I have a file 'records.txt' which contains over 200,000 records.
Each record is on a separate line and has multiple fields separated by a delimiter '|'.
Each row should have 35 fields, but the problem is one of these rows has <>35 fields, i.e. <>35 '|' characters.
Can someone please suggest a way in Unix, by which I can identify the row. (Like getting count of '|' characters in each row in the file)
Try this:
awk -F '|' 'NF != 35 {print NR, $0} ' your_filefile
This small perl script should do it:
cat records.txt | perl -ne '$t = $_; $t =~ s/[^\|]//g; print unless length($t) == 35;'
This works by removing all the characters except the |, then counting what is left.
Greg's way with bash stuff, for the bash friends out there :)
while read n; do [ `echo $n | tr -cd '|' | wc -c` != 35 ] && echo $n; done < records.txt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With