【leetcode题解】30.串联所有单词的子串

题目描述

给定一个字符串s和一个字符串数组words。words中所有字符串长度相同。

s中的 串联子串 是指一个包含words中所有字符串以任意顺序排列连接起来的子串。

例如，如果words = ["ab","cd","ef"]，那么"abcdef"，"abefcd"，"cdabef"，"cdefab"，"efabcd"，和"efcdab"都是串联子串。"acdbef"不是串联子串，因为他不是任何words排列的连接。

返回所有串联子串在s中的开始索引。你可以以 任意顺序 返回答案。

解题思路

首先不妨假定一下字符串s的长度为n，单词列表words长度为l，以及每个单词的长度为w。由于子串由words中的单词以任意顺序排列构成，所以长度是确定的，即 $w\times l$ 。那么我们就可以对字串开头进行枚举，而后针对每一个子串，分别统计其中的单词是不是words的一个排列。这样的算法枚举子串开头的时间复杂度是 $O(n)$ ，循环内层的判定的时间复杂度是 $O(l)$ ，算法总的复杂度是 $O(nl)$ .

有一个明显的可以优化的点在于，假设子串起始下标为st，则当我们检查完[st, st+wl-1]的子串后，我们下一个检查的子串是[st+1, st+wl]。可以看到，检查第二个子串的过程中，其实一大部分是我们在前一次检查中已经扫描过的值，唯一的变化在于子串起始位置和末了位置产生了一定的偏移。因此，可以考虑滑动窗口的方法。

但在此题中，直接使用滑动窗口似乎没那么好用。究其原因在于当子串左右端点移动时只移动了一个字符，而我们维护的却是以word为单元的出现次数。因此，要使用滑动窗口方法，我们需要将每w个字符视作一个单元；子串端点移动时，也以w个字符为步长进行移动。

当然，由于滑动窗口的出发点是下标为 $0$ 的位置，这样做会使得我们的子串起始点仅能位于w的整数倍处。一个解决方法是我们枚举所有可能的起始点，即模 $w$ 的剩余类，这样，子串就可以取到所有可能起始点了。

这般，我们枚举可能起始点的时间复杂度为 $O(w)$ （即 $0$ 到 $w-1$ ），每一次枚举后使用滑动窗口来判断能否构成满足要求的子串，时间复杂度为 $O(n/w)$ （因为我们将w个字符视作了一个单元，因此总长度为 $n/w$ ）。整个算法时间复杂度为 $O(n)$ 。

代码（C++）

class Solution {
    // 用于重置哈希表计数器的函数
    void reset_map(unordered_map<string, int>& mp){
        for(auto &kv: mp){
            kv.second = 0;
        }
        return;
    }

    // 从下标st开始的滑动窗口
    void findSubstringAt(string& s, vector<string>& words, int st, vector<int>& ans, unordered_map<string, int>& target, unordered_map<string, int>& mp){
        int w = words[0].length(), l = words.size(), n = s.length();
        int cur = st;
        while(cur <= n - w){
            string substr = string(s, cur, w);
            auto it = mp.find(substr);
            if(it == mp.end()){
                st = cur + w;
                reset_map(mp);
            }else{
                mp[substr] ++;
                while(mp[substr] > target[substr]){
                    mp[string(s, st, w)] --;
                    st += w;
                }
                if(cur - st == w*(l-1)){
                    ans.push_back(st);
                }
            }

            cur += w;
        }

        return;
    }
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        int n = s.length(), l = words.size(), w = words[0].length();
        vector<int> ans;
        unordered_map<string, int> target;  // 目标每个单词的出现次数
        unordered_map<string, int> mp;  // 窗口中每个单词的出现个数
        for(auto word: words){
            mp[word] = 0;
        }
        for(auto word: words){
            if(target.find(word) != target.end()) target[word] ++;
            else target[word] = 1;
        }

        // 对所有可能的出发点，调用findSubstringAt()函数
        for(int i=0; i<w; ++i){
            reset_map(mp);
            findSubstringAt(s, words, i, ans, target, mp);
        }

        return ans;

    }
};

时空复杂度