Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Number of Distinct substring of a string

I was solving DISTINCT SUBSTRING (given a string, we need to find the total number of its distinct substrings).
I am using trie of suffixes to solve it.
I am passing the test cases, but getting TLE when I submit. Also, the space consumed is very large, at 4093M.

Note: Since there can be 256 char in total, I am setting an array size of 257, and the ascii value is acting as the index.

What I am thinking now:

for(int i=0;i<p;i++){
    string temp = str.substr(i,p-i);
    insert1(root,temp);
}

Since substr() may take O(n) time, in the worst case insert function also takes (n) time in the worst case, and O(n) for loop: O(n^3). This is getting me TLE.

error: could not convert 'temp' from 'std::__cxx11::string* {aka std::__cxx11::basic_string*}' to 'std::__cxx11::string {aka std::__cxx11::basic_string}'| ||=== Build failed: 2 error(s), 0 warning(s) (0 minute(s), 0 second(s)) ===|

So I am thinking to replace substr() by something like this:

for(int i=0;i<p;i++){
     string *temp = &str[i];
     insert1(root,temp);
}  ///and it is giving me error please suggest here what is the mistake and what to do

So that I can save O(n) time.

Please tell me how can I modify my trie approach so it gets accepted.

#include<iostream>
#include<string>
using namespace std;

const int alphabetsize = 257;
int cnt=0;
struct trienode{
    struct trienode* children[alphabetsize];
    bool isendofword;
};

struct trienode *getnode(void){
    struct trienode *newnode = new trienode;
    newnode->isendofword = false;
    for(int i=0;i<alphabetsize;i++){
        newnode->children[i]=NULL;
    }
    return newnode;
}
void insert1(struct trienode* root,string &key){
    struct trienode *temp = root;
    for(int i=0;i<key.length();i++){
        int index = key[i];
        if(!temp->children[index]){
            temp->children[index]=getnode();
            cnt++;
        }
        temp = temp->children[index];
    }
    temp->isendofword=true;
}
int main(){
    int t;
    cin>>t;
    while(t--){
        cnt=0;
        string str;
        cin>>str;
        int p = str.length();
        struct trienode* root = getnode();
        for(int i=0;i<p;i++){
            string temp = str.substr(i,p-i);
            insert1(root,temp);
        }
        cout<<cnt<<endl;
    }

}
like image 521
aka1234 Avatar asked Feb 14 '19 07:02

aka1234


People also ask

How do you find the number of distinct substrings in a string?

Number of substrings of a string of length n is (n * (n + 1) / 2).

How many substrings are there?

The simplest approach is to generate all possible substrings from the given string and check for each substring whether it contains all distinct characters or not. If the length of string is N, then there will be N*(N+1)/2 possible substrings.

How do I get a unique substring in Python?

Basically you use itertools to first find all combinations, then set to find unique elements, then cast to list. Edit for a clearer explanation: First, use combinations to get all pair of indexes corresponding to substrings. The trick here is that itertools.


1 Answers

I'm not experienced in C++ but the error you commented about seems like it might stem from different expectations by the compiler for the type of variable it's receiving when encountering the variable temp.

As others have noted in SPOJ and in the comments, since the input length is only 256 characters, you could possibly get away with brute-force hashing and counting all of the substring occurences.

Another option is to examine the longest common prefixes in the suffix array for the string, both of which there are known construction algorithms for. If we iterate from the end of the suffix array, the difference between the current suffix length and the longest common prefix with its neighbour to the right tells us how many new distinct substrings are introduced.

For example:

01234
CCCCC

suffix array:
01234
43210
CCCCC
 CCCC
  CCC
   CC
    C

i: 4 -> 5 new substrings
i: 3 -> lcp(3,4) = len(3) no new substrings
i: 2 -> lcp(2,3) = len(2) no new substrings
i: 1 -> lcp(1,2) = len(1) no new substrings
i: 0 -> lcp(0,1) = len(0) no new substrings

total: 5 distinct substrings

Second example:

01234
ABABA

suffix array:
01234
42031
AAABB
 BBAA
 AA B
  B A
  A

i: 4 -> 4 new substrings
i: 3 -> lcp(3,4) = len(3) no new substrings
i: 2 -> 5 new substrings
i: 1 -> lcp(1,2) = len(1) no new substrings
i: 0 -> lcp(0,1) = len(0) no new substrings

total: 9 distinct substrings
like image 80
גלעד ברקן Avatar answered Oct 22 '22 15:10

גלעד ברקן