Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print lines that only contain characters from a list in BASH?

Tags:

regex

grep

bash

I have a file called "dictionary.txt" containing a list of all possible words, e.g.:

a
aardvark
act
anvil
ate
...

How can I search this, only printing lines containing letters from a limited list, e.g., if the list contains the letters "c", "a", and "t", a search will reveal these words:

a
act
cat

If the letters "e", "a", and "t" are searched, only these words are found from "dictionary.txt":

a
ate
eat
tea

The only solution I have managed is this:

  • Create a list of all possible letters.
  • Delete the searched letters from this list, leaving a list of letters that I do not want to search for.
  • With a for loop cycling each of those letters, delete all lines from the dictionary that contains those letters.
  • Print the remaining words found in the dictionary.

This solution is very slow. Also, I need to use this code with other languages, which have thousands of possible characters, so this search method is especially slow.

How can I print only those lines from "dictionary.txt" that only contain the searched-for-letters, and nothing else?

like image 811
Village Avatar asked May 19 '14 14:05

Village


2 Answers

grep '^[eat]*$' dictionary.txt

Explanation:

^ = marker meaning beginning of line

$ = marker meaning end of line

[abc] = character class ("match any one of these characters")

* = multiplier for character class (zero or more repetitions)

like image 200
amphetamachine Avatar answered Oct 22 '22 09:10

amphetamachine


Unfortunately, I cannot comment, otherwise I'd add to amphetamachine's answer. Anyway, with the updated condition of thousands of search characters you may want to do the following:

grep -f patterns.txt dictionary.txt

where patterns.txt is your regexp:

/^[eat]\+$/

Below is a sample session:

$ cat << EOF > dictionary.txt
> one
> two
> cat
> eat
> four
> tea
> five
> cheat
> EOF
$ cat << EOF > patterns.txt
> ^[eat]\+$
> EOF
$ grep -f patterns.txt dictionary.txt
eat
tea
$

This way you are not limited by the shell (Argument list too long). Also, you can specify multiple patterns in the file:

$ cat patterns.txt
^[eat]\+$
^five$
$ grep -f patterns.txt dictionary.txt
eat
tea
five
$
like image 32
galaxy Avatar answered Oct 22 '22 08:10

galaxy