I have a file with a large number of similar strings. I want to count unique occurrences of a regex, and also show what they were, e.g. for the pattern Profile: (\w*)
on the file:
Profile: blah
Profile: another
Profile: trees
Profile: blah
I want to find that there are 3 occurrences, and return the results:
blah, another, trees
Using grep -c alone will count the number of lines that contain the matching word instead of the number of total matches. The -o option is what tells grep to output each match in a unique line and then wc -l tells wc to count the number of lines. This is how the total number of matching words is deduced.
To count a regex pattern multiple times in a given string, use the method len(re. findall(pattern, string)) that returns the number of matching substrings or len([*re. finditer(pattern, text)]) that unpacks all matching substrings into a list and returns the length of it as well.
grep -c is useful for finding how many times a string occurs in a file, but it only counts each occurence once per line.
Counting Matches With grepThe grep command has the -c flag, which will count the number of lines matched and print out a number. This is useful for lots of things, such as searching through log files for the number of entries from a particle IP, endpoint, or other identifier.
Try this:
egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq
Output:
another
blah
trees
Description
egrep
with -o
option will fetch matching pattern within a file.
sed
will only fetch capturing part
sort
followed by uniq
will give a list of unique elements
To get number of elements in resultant list, append the command with wc -l
egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq | wc -l
Output:
3
awk '{a[$2]}END{for(x in a)print x}' file
will work on your example
kent$ echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{for(x in a)print x}'
another
trees
blah
if you want to have the count (3) in output:
awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }' file
with same example:
kent$ echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }'
count: 3
another
trees
blah
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With