Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of occurrences of a string using sed?

Tags:

I have a file which contains "title" written in it many times. How can I find the number of times "title" is written in that file using the sed command provided that "title" is the first string in a line? e.g.

# title title title 

should output the count = 2 because in first line title is not the first string.

Update

I used awk to find the total number of occurrences as:

awk '$1 ~ /title/ {++c} END {print c}' FS=: myFile.txt 

But how can I tell awk to count only those lines having title the first string as explained in example above?

like image 574
Usman Avatar asked Nov 23 '09 05:11

Usman


People also ask

How do I count the number of occurrences of a string in Linux?

Using grep -c alone will count the number of lines that contain the matching word instead of the number of total matches. The -o option is what tells grep to output each match in a unique line and then wc -l tells wc to count the number of lines. This is how the total number of matching words is deduced.

How do you sed multiple times?

You can tell sed to carry out multiple operations by just repeating -e (or -f if your script is in a file). sed -i -e 's/a/b/g' -e 's/b/d/g' file makes both changes in the single file named file , in-place.

How do you count something in Linux?

On Linux and Unix-like operating systems, the wc command allows you to count the number of lines, words, characters, and bytes of each given file or standard input and print the result.


1 Answers

Never say never. Pure sed (although it may require the GNU version).

#!/bin/sed -nf # based on a script from the sed info file (info sed) # section 4.8 Numbering Non-blank Lines (cat -b) # modified to count lines that begin with "title"  /^title/! be  x /^$/ s/^.*$/0/ /^9*$/ s/^/0/ s/.9*$/x&/ h s/^.*x// y/0123456789/1234567890/ x s/x.*$// G s/\n// h  :e  $ {x;p} 

Explanation:

#!/bin/sed -nf # run sed without printing output by default (-n) # using the following file as the sed script (-f)  /^title/! be        # if the current line doesn't begin with "title" branch to label e  x                   # swap the counter from hold space into pattern space /^$/ s/^.*$/0/      # if pattern space is empty start the counter at zero /^9*$/ s/^/0/       # if pattern space starts with a nine, prepend a zero s/.9*$/x&/          # mark the position of the last digit before a sequence of nines (if any) h                   # copy the marked counter to hold space s/^.*x//            # delete everything before the marker y/0123456789/1234567890/   # increment the digits that were after the mark x                   # swap pattern space and hold space s/x.*$//            # delete everything after the marker leaving the leading digits G                   # append hold space to pattern space s/\n//              # remove the newline, leaving all the digits concatenated h                   # save the counter into hold space  :e                  # label e  $ {x;p}             # if this is the last line of input, swap in the counter and print it 

Here are excerpts from a trace of the script using sedsed:

$ echo -e 'title\ntitle\nfoo\ntitle\nbar\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle\ntitle' | sedsed-1.0 -d -f ./counter  PATT:title$ HOLD:$ COMM:/^title/ !b e COMM:x PATT:$ HOLD:title$ COMM:/^$/ s/^.*$/0/ PATT:0$ HOLD:title$ COMM:/^9*$/ s/^/0/ PATT:0$ HOLD:title$ COMM:s/.9*$/x&/ PATT:x0$ HOLD:title$ COMM:h PATT:x0$ HOLD:x0$ COMM:s/^.*x// PATT:0$ HOLD:x0$ COMM:y/0123456789/1234567890/ PATT:1$ HOLD:x0$ COMM:x PATT:x0$ HOLD:1$ COMM:s/x.*$// PATT:$ HOLD:1$ COMM:G PATT:\n1$ HOLD:1$ COMM:s/\n// PATT:1$ HOLD:1$ COMM:h PATT:1$ HOLD:1$ COMM::e COMM:$ { PATT:1$ HOLD:1$ PATT:title$ HOLD:1$ COMM:/^title/ !b e COMM:x PATT:1$ HOLD:title$ COMM:/^$/ s/^.*$/0/ PATT:1$ HOLD:title$ COMM:/^9*$/ s/^/0/ PATT:1$ HOLD:title$ COMM:s/.9*$/x&/ PATT:x1$ HOLD:title$ COMM:h PATT:x1$ HOLD:x1$ COMM:s/^.*x// PATT:1$ HOLD:x1$ COMM:y/0123456789/1234567890/ PATT:2$ HOLD:x1$ COMM:x PATT:x1$ HOLD:2$ COMM:s/x.*$// PATT:$ HOLD:2$ COMM:G PATT:\n2$ HOLD:2$ COMM:s/\n// PATT:2$ HOLD:2$ COMM:h PATT:2$ HOLD:2$ COMM::e COMM:$ { PATT:2$ HOLD:2$ PATT:foo$ HOLD:2$ COMM:/^title/ !b e COMM:$ { PATT:foo$ HOLD:2$ . . . PATT:10$ HOLD:10$ PATT:title$ HOLD:10$ COMM:/^title/ !b e COMM:x PATT:10$ HOLD:title$ COMM:/^$/ s/^.*$/0/ PATT:10$ HOLD:title$  COMM:/^9*$/ s/^/0/ PATT:10$ HOLD:title$ COMM:s/.9*$/x&/ PATT:1x0$ HOLD:title$ COMM:h PATT:1x0$ HOLD:1x0$ COMM:s/^.*x// PATT:0$ HOLD:1x0$ COMM:y/0123456789/1234567890/ PATT:1$ HOLD:1x0$ COMM:x PATT:1x0$ HOLD:1$ COMM:s/x.*$// PATT:1$ HOLD:1$ COMM:G PATT:1\n1$ HOLD:1$ COMM:s/\n// PATT:11$ HOLD:1$ COMM:h PATT:11$ HOLD:11$ COMM::e COMM:$ { COMM:x PATT:11$ HOLD:11$ COMM:p 11 PATT:11$ HOLD:11$ COMM:} PATT:11$ HOLD:11$ 

The ellipsis represents lines of output I omitted here. The line with "11" on it by itself is where the final count is output. That's the only output you'd get when the sedsed debugger isn't being used.

like image 53
Dennis Williamson Avatar answered Sep 21 '22 01:09

Dennis Williamson