Substring with Concatenation of All Words - C Solution

1. Introduction

In this post, we explore an intriguing string manipulation problem in C programming: finding substrings in a given string 's' that are concatenations of all permutations of an array of strings 'words'. This problem is a great exercise in understanding string processing, hash tables, and combinatorics in programming.

Problem

Given a string 's' and an array of strings 'words', where all strings in 'words' are of the same length, the task is to find all starting indices in 's' where a substring is a concatenation of each string in 'words' exactly once. Such a substring must include all the strings in 'words' in any order, without overlapping.

2. Solution Steps

1. Calculate the total length of all strings in 'words' combined.

2. Iterate over 's' in windows of this total length.

3. In each window, check if every word in 'words' appears exactly once using a hash table.

4. If a window contains all words exactly once, add the starting index of this window to the result list.

3. Code Program

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

// Function to add a word to the hash table
void addWord(char *word, int wordLength, int *hashTable) {
    unsigned int hash = 0;
    for (int i = 0; i < wordLength; i++) {
        hash = (hash * 131) + word[i]; // Simple hash function
    }
    hashTable[hash % 1000]++; // Increment count in hash table
}

// Function to check if all words are in the substring
int checkWords(char *s, int start, int wordLength, int wordCount, int *hashTable) {
    int tempTable[1000] = {0}; // Temporary hash table for checking
    for (int i = 0; i < wordCount; i++) {
        addWord(s + start + i * wordLength, wordLength, tempTable); // Add each word to temp table
    }

    for (int i = 0; i < 1000; i++) {
        if (tempTable[i] != hashTable[i]) return 0; // Check if counts match
    }
    return 1;
}

// Main function to find all starting indices
int* findSubstring(char *s, char **words, int wordsSize, int *returnSize) {
    int wordLength = strlen(words[0]); // Length of each word
    int totalLength = wordsSize * wordLength; // Total length of concatenated words
    int *result = malloc(strlen(s) * sizeof(int)); // Array to store indices
    *returnSize = 0;

    int hashTable[1000] = {0}; // Hash table for words
    for (int i = 0; i < wordsSize; i++) {
        addWord(words[i], wordLength, hashTable); // Add each word to hash table
    }

    for (int i = 0; i <= strlen(s) - totalLength; i++) { // Iterate over 's'
        if (checkWords(s, i, wordLength, wordsSize, hashTable)) { // Check if window contains all words
            result[*returnSize] = i; // Add index to result
            (*returnSize)++;
        }
    }
    return result;
}

int main() {
    char *s = "barfoothefoobarman";
    char *words[] = {"foo", "bar"};
    int returnSize;
    int *indices = findSubstring(s, words, 2, &returnSize); // Find substrings

    for (int i = 0; i < returnSize; i++) {
        printf("%d\n", indices[i]); // Print each index
    }
    free(indices); // Free allocated memory
    return 0;
}

Output:

0
9

Explanation:

The program first creates a hash table for the 'words' array, mapping each word to its frequency. 

Then, it iterates over the string 's' in windows of length equal to the total length of all words combined. In each window, it checks if every word from 'words' appears exactly once. This is done by creating a temporary hash table for each window and comparing it with the original hash table. If they match, it means the window contains a concatenation of all words in some order, and the starting index of this window is added to the result list. 

In the provided example, "barfoothefoobarman" contains the concatenation "barfoo" and "foobar" at indices 0 and 9, respectively.


Comments