Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pattern matching in if statement in bash

Tags:

regex

bash

I'm trying to count the words with at least two vowels in all the .txt files in the directory. Here's my code so far:

#!/bin/bash

wordcount=0


for i in $HOME/*.txt
do
cat $i |
while read line
do
    for w in $line
    do
    if [[ $w == .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
    then
        wordcount=`expr $wordcount + 1`
        echo $w ':' $wordcount  
    else
        echo "In else"
    fi
    done
done
echo $i ':' $wordcount
wordcount=0
done

Here is my sample from a txt file

Last modified: Sun Aug 20 18:18:27 IST 2017
To remove PPAs
sudo apt-get install ppa-purge
sudo ppa-purge ppa:

The problem is it doesn't match the pattern in the if statement for all the words in the text file. It goes directly to the else statement. And secondly, the wordcount in echo $i ':' $wordcount is equal to 0 which should be some value.

like image 527
Wanmi Siangshai Avatar asked Oct 11 '25 14:10

Wanmi Siangshai


1 Answers

Immediate Issue: Glob vs Regex

[[ $string = $pattern ]] doesn't perform regex matching; instead, it's a glob-style pattern match. While . means "any character" in regex, it matches only itself in glob.

You have a few options here:

  1. Use =~ instead to perform regular expression matching:

    [[ $w =~ .*[aeiouAEIOU].*[AEIOUaeiou].* ]]
    
  2. Use a glob-style expression instead of a regex:

    [[ $w = *[aeiouAEIOU]*[aeiouAEIOU]* ]]
    

    Note the use of = rather than == here; while either is technically valid, the former avoids building finger memory that would lead to bugs when writing code for a POSIX implementation of test / [, as = is the only valid string comparison operator there.


Larger Issue: Properly Reading Word-By-Word

Using for w in $line is innately unsafe. Use read -a to read a line into an array of words:

#!/usr/bin/env bash

wordcount=0
for i in "$HOME"/*.txt; do
  while read -r -a words; do
    for word in "${words[@]}"; do
      if [[ $word = *[aeiouAEIOU]*[aeiouAEIOU]* ]]; then
        (( ++wordcount ))
      fi
    done
  done <"$i"
  printf '%s: %s\n' "$i" "$wordcount"
  wordcount=0
done
like image 54
Charles Duffy Avatar answered Oct 14 '25 12:10

Charles Duffy