[leetcode] 28. 找出字符串中第一个匹配项的下标

作者：Cpp五条 | 2024-06-04 00:12:54

踩

找出字符串中第一个匹配项的下标

文章目录

题目描述
解题方法

题目描述

给你两个字符串 haystack 和 needle ，请你在 haystack 字符串中找出 needle 字符串的第一个匹配项的下标（下标从 0 开始）。如果 needle 不是 haystack 的一部分，则返回 -1 。

示例 1：

输入：haystack = "sadbutsad", needle = "sad"
输出：0
解释："sad" 在下标 0 和 6 处匹配。
第一个匹配项的下标是 0 ，所以返回 0 。
1
2
3
4

示例 2：

输入：haystack = "leetcode", needle = "leeto"
输出：-1
解释："leeto" 没有在 "leetcode" 中出现，所以返回 -1 。
1
2
3

提示：

1 <= haystack.length, needle.length <= 10⁴
haystack 和 needle 仅由小写英文字符组成

解题方法

方法一：双指针

从haystack的起始位置开始与needle的起始位置匹配，一旦发现不匹配的字符，则haystack从上次遍历的起始位置往后移动一格，needle重新回到起始位置进行下一次匹配。若haystack遍历到末尾之前匹配成功，则返回haystack匹配成功的起始下标；否则，返回-1。

java代码

public int strStr(String haystack, String needle) {
    for (int i = 0; i <= haystack.length() - needle.length(); i++) {
        for (int j = 0; j < needle.length(); j++) {
            if (haystack.charAt(i + j) != needle.charAt(j)) {
                break;
            }
            if (j == needle.length() - 1) {
                return i;
            }
        }
    }
    return -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13

复杂度分析

时间复杂度：设haystack长度为n，needle长度为m，最坏的情况下haystack遍历的次数为n - m，每次遍历needle的匹配长度为m，则渐进时间复杂度 $\times (n-m))$
空间复杂度： $O (1)$ ，除了双指针不需要存储其他变量。

方法二：KMP算法

我们从方法一可以看到，每次我们进行字符串匹配时，如果haystack与needle不匹配，则haystack从上一次遍历的起始位置往后移动一格，再与needle从头开始匹配。假设haystack上一次遍历从起始位置开始与needle的前k个字符匹配，那么有没有一种方法使我们不需要让haystack回到上一次起始位置的下一格与needle从头匹配，而是继续在起始位置后面的第k个坐标与needle进行后续的匹配呢？答案是可以，需要我们使用KMP算法。

KMP算法的核心就是最长前缀和，那么什么是最长前缀和呢？

假设有一个字符串 $aabaaaba$ ，我们设 $n e x t$ 数组为每个位置的最长前缀和，我们需要求 $n e x t [i]$ 。 $n e x t [i]$ 代表的含义是匹配的字符串以 $i$ 位置为终点时（不包括终点 $i$ ），能与原字符串前缀匹配的最长前缀和，匹配的字符串不能与原字符串前缀是同一个字符串。

当 $i = 0$ 时，此时 $i$ 前面没有字符，故 $n e x t [0] = 0$ 。
当 $i = 1$ 时，此时 $i$ 前面的字符为 $a$ ，与字符串起始位置的 $a$ 在同一个位置，由于匹配的字符串与原字符串前缀是同一个字符串，此时也记 $n e x t [1] = 0$ 。
当 $i = 2$ 时，此时 $i$ 前面的字符为 $a$ ，后缀 $a$ 与原字符串前缀 $a$ 匹配，故 $n e x t [2] = 1$ 。
当 $i = 3$ 时，此时 $i$ 前面的字符为 $b$ ，由于匹配的字符串结尾为 $b$ ，无法与原字符串前缀 $a$ 或者 $aa$ 匹配，所以 $n e x t [3] = 0$ 。
当 $i = 4$ 时， $i$ 前面的字符为 $a$ ，此时后缀 $a$ 只能匹配原字符串前缀 $a$ ，故 $n e x t [4] = 1$ 。
当 $i = 5$ 时， $i$ 前面的字符为 $a$ ，此时后缀 $aa$ 可以匹配原字符串前缀 $aa$ ，故 $n e x t [5] = 2$ 。
当 $i = 6$ 时， $i$ 前面的字符为 $a$ ，此时后缀 $aa$ 可以匹配原字符串前缀 $aa$ ，故 $n e x t [6] = 2$ 。
当 $i = 7$ 时， $i$ 前面的字符为 $b$ ，此时后缀 $aab$ 可以匹配原字符串前缀 $aab$ ，故 $n e x t [7] = 3$ 。

此时即求出了 $n e x t$ 数组每个位置的最长前缀和。

求出 $n e x t$ 数组有什么用呢？那我们再举一个haystack 和 needle 字符串匹配的例子。

设needle字符串还是 $\color{red}aabaaaba$ ，haystack字符串为 $\color{red}aabaaabb$ $\color{blue}aabaaaba$ 。当haystack从起始位置匹配到字符串 $\color{red}aabaaabb$ 时，此时haystack匹配字符串末尾的 $b$ 与needle末尾的 $a$ 不匹配。一般情况下，我们就将haystack移动到起始位置的第二个字符，与needle从头开始匹配了。但是有了next数组之后，不匹配的位置在needle下标 $i = 7$ ，我们检查到 $n e x t [7] = 3$ ，也就是说haystack匹配的字符串 $\color{red}aabaaabb$ 中可以再从后缀 $\color{red}aabb$ 开始与needle中 $i = 3$ 位置的字符开始匹配，此时i=3位置的字符为 $a$ 与后缀字符 $b$ 不匹配，next[3]=0，此时没有后缀与needle前缀匹配了，此时haystack再从 $aabb$ 最后一个后缀 $b$ 开始，与needle从头进行匹配。可以看出在匹配的过程中，只要haystack的匹配位置移动到了第k个字符，则haystack就不需要再回到第k个字符之前从头遍历，只需要移动needle的匹配位置比较haystack的第k个字符，这样大大减少了匹配时间。

java代码

public int strStr(String haystack, String needle) {
    if (haystack == null || needle == null ||
            haystack.length() < needle.length()) {
        return -1;
    }
    if (needle.length() == 0) {
        return 0;
    }
    char[] str1 = haystack.toCharArray();
    char[] str2 = needle.toCharArray();
    int[] next = getNextArr(str2);
    int i1 = 0;
    int i2 = 0;
    while (i1 < str1.length && i2 < str2.length) {
        if (str1[i1] == str2[i2]) {
            i1++;
            i2++;
        } else if (i2 > 0) {
            // haystack与needle在needle第i2位置的字符不匹配之时，先让i2回到next[i2]
            // 此时needle从0 ~ i2-1的前缀与haystack从i1-i2 ~ i-1的字符串匹配
            i2 = next[i2];
        } else {
            i1++;
        }
    }
    return i2 == str2.length ? i1 - i2 : -1;
}

// 使用kmp算法计算next最长前缀和数组
public int[] getNextArr(char[] str) {
    if (str.length == 1) {
        return new int[]{0};
    }
    // next[i]代表以i位置为终点时（不包括i)，最长后缀与最长前缀匹配的长度。后缀的起点位置不能从下标0开始。
    int[] next = new int[str.length];
    // next下标0和1之前都没有后缀与前缀匹配
    next[0] = 0;
    next[1] = 0;
    int i = 2;
    // 最长前缀和计数
    int cnt = 0;
    while (i < next.length) {
        if (str[i - 1] == str[cnt]) {
            next[i++] = ++cnt;
        } else if (cnt > 0) {
            // 当前后缀最后的字符不匹配之时，先让后缀的起始位置移动到更靠后的位置，与next[cnt]处的字符进行比较
            //（此时str的后缀与cnt之前的字符串匹配）
            cnt = next[cnt];
        } else {
            next[i++] = 0;
        }
    }
    return next;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54

复杂度分析

时间复杂度：设haystack长度为n，needle长度为m，haystack只会遍历一次，needle也会遍历一次，时间复杂度 $O (m + n)$
空间复杂度： $O (m)$ ，需要留出 $n e x t$ 最长前缀和数组的空间。

个人公众号
个人小游戏

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/Cpp五条/article/detail/669535