I'm an awk newbie, so please bear with me.
The goal is to change the case of a string such that the first letter of every word is uppercase and the remaining letters are lowercase. (To keep the example simple, "word" is defined here as strictly alphabetic characters; all others are considered separators.)
I learned a nice way to make the first letter of every word uppercase from another post on this website using the following awk command:
echo 'abce efgh ijkl mnop' | awk '{for (i=1;i <= NF;i++) {sub(".",substr(toupper($i),1,1),$i)} print}'
--> Abcd Efgh Ijkl Mnop
Making the remaining letters lowercase is easily accomplished by preceding the awk command with a tr command:
echo 'aBcD EfGh ijkl MNOP' | tr [A-Z] [a-z] | awk '{for (i=1;i <= NF;i++) {sub(".",substr(toupper($i),1,1),$i)} print}'
--> Abcd Efgh Ijkl Mnop
However, in the interest of learning more about awk, I wanted to change the case of all but the first letter to lowercase with a similar awk construct. I used the regular expression \B[A-Za-z]+
to match all letters of a word but the first, and the awk command substr(tolower($i),2)
to provide those same letters in lowercase, as follows:
echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1;i <= NF;i++) {sub("\B[A-Za-z]+",substr(tolower($i),2),$i)} print}'
--> Abcd EFGH IJKL MNOP
Notice that the first word converted properly, but the remaining words are left unchanged. I would be very grateful for an explanation of why the remaining words did not convert properly and how to get them to do so.
The issue is that \B
(zero-width non-word boundary) only seems to match at the beginning of the line, so $1
works but $2
and following fields do not match the regex, so they are not substituted and remain uppercase. Not sure why \B
doesn't match except for the first field... B should match anywhere within any word:
echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1; i<=NF; ++i) { print match($i, /\B/); }}'
2 # \B matches ABCD at 2nd character as expected
0 # no match for EFGH
0 # no match for IJKL
0 # no match for MNOP
Anyway to achieve your result (capitalize only the first character of the line), you can operate on $0
(the whole line) instead of using a for
loop:
echo 'ABCD EFGH IJKL MNOP' | awk '{print toupper(substr($0,1,1)) tolower(substr($0,2)) }'
Or if you still wanted to capitalize each word separately but with awk
only:
awk '{for (i=1; i<=NF; ++i) { $i=toupper(substr($i,1,1)) tolower(substr($i,2)); } print }'
When matching regex using the sub()
function or others (like gsub()
etc), it's best used in the following form:
sub(/regex/, replacement, target)
This is different from what you have:
sub("regex", replacement, target)
So your command becomes:
awk '{ for (i=1;i<=NF;i++) sub(/\B\w+/, substr(tolower($i),2), $i) }1'
Results:
Abcd Efgh Ijkl Mnop
This article on String Functions maybe worth a read. HTH.
I should say that there are easier ways to accomplish what you want, for example using GNU sed
:
sed -r 's/\B\w+/\L&/g'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With