Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Linux tools - how to count and list occurrences of regex in file

Tags:

regex

linux

I have a file with a large number of similar strings. I want to count unique occurrences of a regex, and also show what they were, e.g. for the pattern Profile: (\w*) on the file:

Profile: blah
Profile: another
Profile: trees
Profile: blah

I want to find that there are 3 occurrences, and return the results:

blah, another, trees
like image 645
Stefan Avatar asked Sep 25 '13 14:09

Stefan


People also ask

How can you count for a particular pattern occurrences in a file?

Using grep -c alone will count the number of lines that contain the matching word instead of the number of total matches. The -o option is what tells grep to output each match in a unique line and then wc -l tells wc to count the number of lines. This is how the total number of matching words is deduced.

Can you count with regex?

To count a regex pattern multiple times in a given string, use the method len(re. findall(pattern, string)) that returns the number of matching substrings or len([*re. finditer(pattern, text)]) that unpacks all matching substrings into a list and returns the length of it as well.

How do I count the number of occurrences of a string in Linux?

grep -c is useful for finding how many times a string occurs in a file, but it only counts each occurence once per line.

How do you count the number of occurrences using grep?

Counting Matches With grepThe grep command has the -c flag, which will count the number of lines matched and print out a number. This is useful for lots of things, such as searching through log files for the number of entries from a particle IP, endpoint, or other identifier.


2 Answers

Try this:

egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq

Output:

another
blah
trees

Description

egrep with -o option will fetch matching pattern within a file.

sed will only fetch capturing part

sort followed by uniq will give a list of unique elements

To get number of elements in resultant list, append the command with wc -l

egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq | wc -l

Output:

3
like image 161
jkshah Avatar answered Nov 05 '22 15:11

jkshah


awk '{a[$2]}END{for(x in a)print x}' file

will work on your example

kent$  echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{for(x in a)print x}'
another
trees
blah

if you want to have the count (3) in output:

awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }' file

with same example:

kent$  echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }'
count: 3
another
trees
blah
like image 38
Kent Avatar answered Nov 05 '22 14:11

Kent