Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Changing the case of a string with awk

Tags:

regex

unix

awk

I'm an awk newbie, so please bear with me.

The goal is to change the case of a string such that the first letter of every word is uppercase and the remaining letters are lowercase. (To keep the example simple, "word" is defined here as strictly alphabetic characters; all others are considered separators.)

I learned a nice way to make the first letter of every word uppercase from another post on this website using the following awk command:

echo 'abce efgh ijkl mnop' | awk '{for (i=1;i <= NF;i++) {sub(".",substr(toupper($i),1,1),$i)} print}' --> Abcd Efgh Ijkl Mnop

Making the remaining letters lowercase is easily accomplished by preceding the awk command with a tr command:

echo 'aBcD EfGh ijkl MNOP' | tr [A-Z] [a-z] | awk '{for (i=1;i <= NF;i++) {sub(".",substr(toupper($i),1,1),$i)} print}' --> Abcd Efgh Ijkl Mnop

However, in the interest of learning more about awk, I wanted to change the case of all but the first letter to lowercase with a similar awk construct. I used the regular expression \B[A-Za-z]+ to match all letters of a word but the first, and the awk command substr(tolower($i),2) to provide those same letters in lowercase, as follows:

echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1;i <= NF;i++) {sub("\B[A-Za-z]+",substr(tolower($i),2),$i)} print}' --> Abcd EFGH IJKL MNOP

Notice that the first word converted properly, but the remaining words are left unchanged. I would be very grateful for an explanation of why the remaining words did not convert properly and how to get them to do so.

like image 902
scolfax Avatar asked Jan 03 '13 13:01

scolfax


2 Answers

The issue is that \B (zero-width non-word boundary) only seems to match at the beginning of the line, so $1 works but $2 and following fields do not match the regex, so they are not substituted and remain uppercase. Not sure why \B doesn't match except for the first field... B should match anywhere within any word:

echo 'ABCD EFGH IJKL MNOP' | awk '{for (i=1; i<=NF; ++i) { print match($i, /\B/); }}'
2   # \B matches ABCD at 2nd character as expected
0   # no match for EFGH
0   # no match for IJKL
0   # no match for MNOP

Anyway to achieve your result (capitalize only the first character of the line), you can operate on $0 (the whole line) instead of using a for loop:

echo 'ABCD EFGH IJKL MNOP' | awk '{print toupper(substr($0,1,1)) tolower(substr($0,2)) }'

Or if you still wanted to capitalize each word separately but with awk only:

awk '{for (i=1; i<=NF; ++i) { $i=toupper(substr($i,1,1)) tolower(substr($i,2)); } print }'
like image 129
Anders Johansson Avatar answered Sep 24 '22 04:09

Anders Johansson


When matching regex using the sub() function or others (like gsub() etc), it's best used in the following form:

sub(/regex/, replacement, target)

This is different from what you have:

sub("regex", replacement, target)

So your command becomes:

awk '{ for (i=1;i<=NF;i++) sub(/\B\w+/, substr(tolower($i),2), $i) }1'

Results:

Abcd Efgh Ijkl Mnop

This article on String Functions maybe worth a read. HTH.


I should say that there are easier ways to accomplish what you want, for example using GNU sed:

sed -r 's/\B\w+/\L&/g'
like image 25
Steve Avatar answered Sep 22 '22 04:09

Steve