Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of occurrences of token in a file

Tags:

grep

bash

shell

I have a server access log, with timestamps of each http request, I'd like to obtain a count of the number of requests at each second. Using sed, and cut -c, so far I've managed to cut the file down to just the timestamps, such as:

22-Sep-2008 20:00:21 +0000
22-Sep-2008 20:00:22 +0000
22-Sep-2008 20:00:22 +0000
22-Sep-2008 20:00:22 +0000
22-Sep-2008 20:00:24 +0000
22-Sep-2008 20:00:24 +0000

What I'd love to get is the number of times each unique timestamp appears in the file. For example, with the above example, I'd like to get output that looks like:

22-Sep-2008 20:00:21 +0000: 1
22-Sep-2008 20:00:22 +0000: 3
22-Sep-2008 20:00:24 +0000: 2

I've used sort -u to filter the list of timestamps down to a list of unique tokens, hoping that I could use grep like

grep -c -f <file containing patterns> <file>

but this just produces a single line of a grand total of matching lines.

I know this can be done in a single line, stringing a few utilities together ... but I can't think of which. Anyone know?

like image 832
matt b Avatar asked Sep 24 '08 16:09

matt b


People also ask

How do you count tokens?

To count tokens, one can make use of NLTK's FreqDist class from the probability package. The N() method can then be used to count how many tokens a text or corpus contains. Counts for a specific token can be obtained using fdist["token"] .

How do you get the number of occurrence of a pattern in the file?

You can use grep command to count the number of times "mauris" appears in the file as shown. Using grep -c alone will count the number of lines that contain the matching word instead of the number of total matches.


1 Answers

I think you're looking for

uniq --count

-c, --count prefix lines by the number of occurrences

like image 155
The Archetypal Paul Avatar answered Oct 27 '22 00:10

The Archetypal Paul