I am trying to find out the frequency of appearance of every letter in the english alphabet in an input file. How can I do this in a bash script?
you can use wc to count the number of characters in the file wc -m filename. txt.
bash [filename] runs the commands saved in a file. $@ refers to all of a shell script's command-line arguments. $1 , $2 , etc., refer to the first command-line argument, the second command-line argument, etc. Place variables in quotes if the values might have spaces in them.
`tr` command can be used with -c option to replace those characters with the second character that don't match with the first character value. In the following example, the `tr` command is used to search those characters in the string 'bash' that don't match with the character 'b' and replace them with 'a'.
Using wc command. wc command is used to know the number of lines, word count, byte and characters count etc. Count the number of words using wc -w.
My solution using grep
, sort
and uniq
.
grep -o . file | sort | uniq -c
Ignore case:
grep -o . file | sort -f | uniq -ic
Just one awk command
awk -vFS="" '{for(i=1;i<=NF;i++)w[$i]++}END{for(i in w) print i,w[i]}' file
if you want case insensitive, add tolower()
awk -vFS="" '{for(i=1;i<=NF;i++)w[tolower($i)]++}END{for(i in w) print i,w[i]}' file
and if you want only characters,
awk -vFS="" '{for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} } }END{for(i in w) print i,w[i]}' file
and if you want only digits, change /[a-zA-Z]/
to /[0-9]/
if you do not want to show unicode, do export LC_ALL=C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With