Case insensitive string matching in awk

Question

Assume a multi-line text file file in which some lines start with whitespaces.

$ cat file
foo Baz
  baz QUX
    QUx Quux
BaZ Qux
BazaaR

Further assume that I wish to convert all those lines that start with a keyword (e.g. "baz") to lowercase letters, irrespective if (a) that keyword is written in lower- or uppercase letters (or any combination thereof) itself, and (b) that keyword is preceeded by whitespaces.

$ cat file | sought_command
foo Baz        # not to lowercase (line does not start with keyword)
  baz qux      # to lowercase
    QUx Quux
baz qux        # to lowercase
BazaaR         # not to lowercase (line does not start with keyword, but merely with a word containing the keyword)

I believe that awk is the tool to do it, but I am uncertain how to implement the case-insensitivity for the keyword matching.

$ cat file | awk '{ if($1 ~ /^ *baz/) print tolower($0); else print $0}'
foo Baz
  baz qux
    QUx Quux
BaZ Qux       # ERROR HERE: was not replaced, b/c keyword not recognized.
BazaaR

EDIT 1: Adding IGNORECASE=1 appears to resolve the case-insensitivity, but now incorrectly converts the last line to lowercase.

$ cat file | awk '{IGNORECASE=1; if($1~/^ *baz/) print tolower($0); else print $0}'
foo Baz
  baz qux
    QUx Quux
baz qux
bazaar       # ERROR HERE: should not be converted to lowercase, as keyword not present (emphasis on word!).

Ed Morton · Accepted Answer

You already know about tolower() so just use it again in the comparison and test for an exact string match instead of partial regexp:

awk 'tolower($1)=="baz"{$0=tolower($0)}1'

Sundeep · Answer

Add word-boundary after search string

$ awk '{IGNORECASE=1; if($1~/^ *baz\>/) print tolower($0); else print $0}' ip.txt 
foo Baz
  baz qux
    QUx Quux
baz qux
BazaaR

Can be re-written as:

awk 'BEGIN{IGNORECASE=1} /^ *baz\>/{$0=tolower($0)} 1' ip.txt

Since line anchor is used, no need to match with $1. The 1 at end will print the record, including any changes done

IGNORECASE and \> are gawk specific features. \y can be also used to match word boundary

With GNU sed

$ sed 's/^[[:blank:]]*baz\b.*/\L&/I' ip.txt 
foo Baz
  baz qux
    QUx Quux
baz qux
BazaaR

[[:blank:]] will match space or tab characters
\L& will lowercase the line
\b is word boundary
I flag to match case-insensitively

Case insensitive string matching in awk

Tags:

string

matching

lowercase

case-sensitive

awk

Michael G

2 Answers

Ed Morton

Sundeep

Recent Activity

Donate For Us

Case insensitive string matching in awk

Tags:

string

matching

lowercase

case-sensitive

awk

Michael G

2 Answers

Ed Morton

Sundeep

Related questions

Recent Activity

Donate For Us