Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all words appearing between parenthesis?

Tags:

grep

bash

I have a file containing some words in parenthesis. I'd like to compile a list of all of the unique words appearing there, e.g.:

This is some (text).
This (text) has some (words) in parenthesis.
Sometimes, there are numbers, such as (123) in parenthesis too.

This would be the resulting list:

text
words
123

How can I list all of the items appearing between parenthesis?

like image 271
Village Avatar asked May 19 '12 01:05

Village


People also ask

How do I extract text that lies between parentheses?

Extract Text Between Parenthesis To extract the text between any characters, use a formula with the MID and FIND functions. The FIND Function locates the parenthesis and the MID Function returns the characters in between them.

How do you search for words in parentheses in Word?

Find Words in Parentheses or BracketsIn “Find and Replace” dialog box, enter “\(*\)” in the “Find what” text box. Then click “More” to extend more options. Next check the “Use wildcards” box. And click “Find In” button and select “Main Document”.

How do I extract text between parentheses in Python?

The simplest way to extract the string between two parentheses is to use slicing and string. find() . First, find the indices of the first occurrences of the opening and closing parentheses. Second, use them as slice indices to get the substring between those indices like so: s[s.


3 Answers

You can use awk like this:

awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt

prints:

text
text
words
123

You can use an array to print the unique values:

awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt

prints:

text
words
123

HTH

like image 195
Steve Avatar answered Oct 07 '22 16:10

Steve


With GNU grep, you can use a perl-compatible regex with look-around assertions to exclude the parens:

grep -Po '(?<=\().*?(?=\))' file.txt | sort -u
like image 34
glenn jackman Avatar answered Oct 07 '22 17:10

glenn jackman


grep -oE '\([[:alnum:]]*?\)' | sed 's/[()]//g' | sort | uniq

  • -o Only prints the matching text
  • -E means use extended regular expressions
  • \( means match a literal paren
  • [[:alnum:]] is the POSIX character class for letters and numbers.

That sed script should strip out the parens. This is tested against GNU grep, but BSD sed so be wary.

like image 39
Matt K Avatar answered Oct 07 '22 18:10

Matt K