I have a string of characters of length 50 say representing a sequence <code>abbcda....</code> for alphabets taken from the set <code>A={a,b,c,d}</code>. I want to calculate how many times <code>b</code> is followed by another <code>b</code> (n-grams) where n=2. Similarly, how many times a particular character is repeated thrice n=3 consecutively, say in the input string <code>abbbcbbb</code> etc so here the number of times <code>b</code> occurs in a sequence of 3 letters is 2.

To find the number of non-overlapping 2-grams you can use <pre class="prettyprint"><code>numel(regexp(str, 'b{2}')) </code></pre> and for 3-grams <pre class="prettyprint"><code>numel(regexp(str, 'b{3}')) </code></pre> to count overlapping 2-grams use positive lookahead <pre class="prettyprint"><code>numel(regexp(str, '(b)(?=b{1})')) </code></pre> and for overlapping <code>n</code>-grams <pre class="prettyprint"><code>numel(regexp(str, ['(b)(?=b{' num2str(n-1) '})'])) </code></pre> EDIT In order to find number of occurrences of an arbitrary sequence use the first element in first parenthesis and the rest after equality sign, to find <code>ba</code> use <pre class="prettyprint"><code>numel(regexp(str, '(b)(?=a)')) </code></pre> to find <code>bda</code> use <pre class="prettyprint"><code>numel(regexp(str, '(b)(?=da)')) </code></pre>

How to calculate word co-occurence

1 Answers

To find the number of non-overlapping 2-grams you can use

numel(regexp(str, 'b{2}'))

and for 3-grams

numel(regexp(str, 'b{3}'))

to count overlapping 2-grams use positive lookahead

numel(regexp(str, '(b)(?=b{1})'))

and for overlapping n-grams

numel(regexp(str, ['(b)(?=b{' num2str(n-1) '})']))

EDIT In order to find number of occurrences of an arbitrary sequence use the first element in first parenthesis and the rest after equality sign, to find ba use

numel(regexp(str, '(b)(?=a)'))

to find bda use

numel(regexp(str, '(b)(?=da)'))

102

answered Oct 03 '22 19:10

Mohsen Nosratinia

Related questions
                            
                                Appending two string in x86 assembly
                            
                                Optionally using String.split(), split a string at the last occurance of a delimiter
                            
                                C++: passing a string-literal of Type const char* to a string-parameter
                            
                                null pointer exception string 2d array in java
                            
                                std::string memory leak
                            
                                string contains only alphabets
                            
                                Memory Leak in Rails App... string nightmare
                            
                                java Can i create a string that is defined as null in one line?
                            
                                Using string.IsNullOrEmpty on a potential string in Razor in Umbraco
                            
                                Performing lots of string concatenation in C?
                            
                                Finding whether a string meets a certain pattern
                            
                                When strings are equivalent up to rotation
                            
                                Number of characters in a string (not number of bytes)
                            
                                how does default equals implementation in java works for String? [duplicate]
                            
                                string to bool inline conversion
                            
                                Double quotes inside string HTML [duplicate]
                            
                                Removing a string from a list of strings in Haskell
                            
                                Android: String set preference is not persistent
                            
                                Check if string has any consecutive repeating substring in it
                            
                                Transform string from a1b2c3d4 to abcd1234

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate word co-occurence

Tags:

string

matlab

Srishti M

People also ask

1 Answers

Mohsen Nosratinia

Recent Activity

Donate For Us