Substring with Concatenation of All Words

Sources: leetcode 30

Rating: hard

Given a string, s, and a list words that are all of the same length. Find all starting indices of substrings in s that is a concatenation of each word in words exactly once and without any intervening characters.

def findSubstring(s: str, words: List[str]) -> List[int]:
    # return the starting indices in s where the substring is a concatenation
    # of each word in words without repetition or intervening characters

Solution

The fact that all the words in words are the same length is an advantage to use.

def findSubstring(s, words):
    result = []
    wordlen = len(words[0])
    # add starting indices into result
    return result

The length of each individual word wordlen serves as a modulo to easily limit the number of checks. Each letter position in a word can be treated as an index i that will repeat for each word in words. A counter freq is needed to keep track of the words to be matched so that there is one and only one instance. To check for each word, use two pointers l and r, as well as a count to keep track of the number of words to match.

from collections import Counter # to easily count frequency
def findSubstring(s, words):
    result = []
    wordlen = len(words[0])
    for i in range(wordlen):
        freq = Counter(words) # the frequency of each word
        l, r, count = i, i, len(words) # two pointers and number of words to match
        # check if matching, if so add to result
    return result

The right pointer r provides an easy stopping point, as it must stop at the end of s. Then, each word to check begins from r and continues until the length of the word wordlen. If this word is indeed in the counter freq, the counter must go down, as must the count of the words left to find, and the right pointer r skips ahead by the length wordlen to begin searching for the next word. If the count reaches zero, then all the words in words has been found and the left pointer l is recorded into result as one of the starting indices.

from collections import Counter
def findSubstring(s, words):
    if not words:
        return []
    result = []
    wordlen = len(words[0])
    for i in range(wordlen):
        freq = Counter(words)
        l, r, count = i, i, len(words)
        while r < len(s):
            curr_word = s[r:r + wordlen]
            if curr_word in freq:
                freq[curr_word] -= 1
                if freq[curr_word] >= 0:
                    count -= 1
            r += wordlen
            if count == 0:
                result.append(l)
            # ensure consecutive words
    return result

Next, ensure the words are consecutive, and move the left pointer l each time the search is restarted.

Final Code

from collections import Counter
def findSubstring(s, words):
    if not words:
        return []
    result = []
    wordlen = len(words[0])
    for i in range(wordlen):
        freq = Counter(words)
        l, r, count = i, i, len(words)
        while r < len(s):
            curr_word = s[r:r + wordlen]
            if curr_word in freq:
                freq[curr_word] -= 1
                if freq[curr_word] >= 0:
                    count -= 1
            r += wordlen
            if count == 0:
                result.append(l)
            if r - l == wordlen * len(words):
                curr_word = s[l:l+wordlen]
                if curr_word in freq:
                    freq[curr_word] += 1
                    if freq[curr_word] > 0:
                        count += 1
                l += wordlen
    return result

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s