Number of Distinct Substrings in a String - CPP Solution

1. Introduction

This blog post focuses on an intriguing problem in string processing and data structures in C++: counting the number of distinct substrings in a given string. This problem is a classic example demonstrating the use of advanced data structures like Tries or Suffix Trees for efficient string manipulation and storage.

Problem

Given a string, the objective is to count the total number of distinct substrings it contains. A substring is any contiguous sequence of characters within the string.

2. Solution Steps

1. Create a Trie (a tree-like data structure) to store all possible substrings.

2. Iterate through the string and, for each character, add all substrings starting from that character to the Trie.

3. Ensure that when adding a substring, if a node already exists for a character, move to the next node instead of creating a new one. This helps in avoiding duplicate substrings.

4. Count the number of nodes in the Trie, which represents the number of distinct substrings.

5. Return the count.

3. Code Program

#include <iostream>
#include <string>
#include <vector>
using namespace std;

class TrieNode {
public:
    vector<TrieNode*> children;

    TrieNode() : children(26, nullptr) {}

    ~TrieNode() {
        for (auto child : children) {
            delete child;
        }
    }
};

// Function to add a substring to the Trie
void addSubstring(TrieNode* root, const string& s, int start) {
    TrieNode* node = root;
    for (int i = start; i < s.length(); i++) {
        int index = s[i] - 'a';
        if (!node->children[index]) {
            node->children[index] = new TrieNode();
        }
        node = node->children[index];
    }
}

// Function to count distinct substrings in a string
int countDistinctSubstrings(string s) {
    TrieNode* root = new TrieNode();
    int count = 0;

    for (int i = 0; i < s.length(); i++) {
        addSubstring(root, s, i);
    }

    // Count nodes in the Trie
    vector<TrieNode*> stack = {root};
    while (!stack.empty()) {
        TrieNode* node = stack.back();
        stack.pop_back();
        for (auto child : node->children) {
            if (child) {
                stack.push_back(child);
                count++;
            }
        }
    }

    delete root;
    return count;
}

int main() {
    string s = "ababa";
    cout << "Number of distinct substrings: " << countDistinctSubstrings(s) << endl;
    return 0;
}

Output:

Number of distinct substrings: 10

Explanation:

1. The input string "ababa" is analyzed to find the total number of distinct substrings.

2. A Trie is used to store every possible substring of the string.

3. Each substring starting from different indices is added to the Trie.

4. The Trie structure ensures that only distinct substrings are counted.

5. The total count of distinct substrings in "ababa" is 10, which includes substrings like "a", "ab", "aba", "abab", "ababa", "b", "ba", "bab", "baba", and "a".

6. This method demonstrates an efficient approach to counting distinct substrings using a Trie data structure.


Comments