I have a file containing some words in parenthesis. I'd like to compile a list of all of the unique words appearing there, e.g.:
This is some (text).
This (text) has some (words) in parenthesis.
Sometimes, there are numbers, such as (123) in parenthesis too.
This would be the resulting list:
text
words
123
How can I list all of the items appearing between parenthesis?
Extract Text Between Parenthesis To extract the text between any characters, use a formula with the MID and FIND functions. The FIND Function locates the parenthesis and the MID Function returns the characters in between them.
Find Words in Parentheses or BracketsIn “Find and Replace” dialog box, enter “\(*\)” in the “Find what” text box. Then click “More” to extend more options. Next check the “Use wildcards” box. And click “Find In” button and select “Main Document”.
The simplest way to extract the string between two parentheses is to use slicing and string. find() . First, find the indices of the first occurrences of the opening and closing parentheses. Second, use them as slice indices to get the substring between those indices like so: s[s.
You can use awk
like this:
awk -F "[()]" '{ for (i=2; i<NF; i+=2) print $i }' file.txt
prints:
text
text
words
123
You can use an array to print the unique values:
awk -F "[()]" '{ for (i=2; i<NF; i+=2) array[$1]=$i; print array[$1] }' file.txt
prints:
text
words
123
HTH
With GNU grep, you can use a perl-compatible regex with look-around assertions to exclude the parens:
grep -Po '(?<=\().*?(?=\))' file.txt | sort -u
grep -oE '\([[:alnum:]]*?\)' | sed 's/[()]//g' | sort | uniq
-o
Only prints the matching text-E
means use extended regular expressions\(
means match a literal paren[[:alnum:]]
is the POSIX character class for letters and numbers.That sed
script should strip out the parens. This is tested against GNU grep, but BSD sed so be wary.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With