Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get word between quotes

I have x lines like this:

Unable to find latest released revision of 'CONTRIB_046578'.   

And I need to extract the word between the revision of ' and ' in this example the word CONTRIB_046578 and if possible count the number of occurrences of that word using grep, sed or any other command?

like image 283
user1921608 Avatar asked Dec 27 '22 12:12

user1921608


2 Answers

The cleanest solution is with grep -Po "(?<=')[^']+(?=')"

$ cat file
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'foo'
Unable to find latest released revision of 'bar'
Unable to find latest released revision of 'CONTRIB_046578'

# Print occurences 
$ grep -Po "(?<=')[^']+(?=')" file
CONTRIB_046578
foo
bar
CONTRIB_046578

# Count occurences
$ grep -Pc "(?<=')[^']+(?=')" file
4

# Count unique occurrences 
$ grep -Po "(?<=')[^']+(?=')" file | sort | uniq -c 
2 CONTRIB_046578
1 bar
1 foo
like image 98
Chris Seymour Avatar answered Jan 06 '23 06:01

Chris Seymour


All you need is a very simple awk script to count the occurrences of what's between the quotes:

awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file

Using @anubhava's test input file:

$ cat file
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046578'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046570'
Unable to find latest released revision of 'CONTRIB_046579'
Unable to find latest released revision of 'CONTRIB_046572'
Unable to find latest released revision of 'CONTRIB_046579'
$
$ awk -F\' '{c[$2]++} END{for (w in c) print w,c[w]}' file
CONTRIB_046578 1
CONTRIB_046579 3
CONTRIB_046570 1
CONTRIB_046572 2
like image 42
Ed Morton Avatar answered Jan 06 '23 07:01

Ed Morton