Python算法题集_找到字符串中所有字母异位词

作者：小桥流水78 | 2024-07-19 18:21:53

踩

找到字符串中所有字母异位词

本文为Python算法题集之一的代码示例

题目438：找到字符串中所有字母异位词

说明：给定两个字符串 s 和 p，找到 s 中所有 p 的 异位词 的子串，返回这些子串的起始索引。不考虑答案输出的顺序。

异位词 指由相同字母重排列形成的字符串（包括相同的字符串）。

示例 1:

输入: s = "cbaebabacd", p = "abc"
输出: [0,6]
解释:
起始索引等于 0 的子串是 "cba", 它是 "abc" 的异位词。
起始索引等于 6 的子串是 "bac", 它是 "abc" 的异位词。
1
2
3
4
5

示例 2:

输入: s = "abab", p = "ab"
输出: [0,1,2]
解释:
起始索引等于 0 的子串是 "ab", 它是 "ab" 的异位词。
起始索引等于 1 的子串是 "ba", 它是 "ab" 的异位词。
起始索引等于 2 的子串是 "ab", 它是 "ab" 的异位词。
1
2
3
4
5
6

提示:

1 <= s.length, p.length <= 3 * 104
s 和 p 仅包含小写字母

问题分析

因p是固定的，所以检查是否为p的异位词可以直接使用数组的字符计数比较即可【p仅含小写字母，因此数组只有26个元素】
因p长度固定，因此单循环即可遍历字符串
优化思路
1. 减少计算
2. 加快比较

标准版【循环进行异位词比较】，性能良好，超越89%，标准版的性能就比较高，说明本题可以优化的空间不大

注意：CheckFuncPerf是我手搓的函数用时和内存占用模块，下载地址在这里：测量函数运行用时、内存占用的代码单元CheckFuncPerf.py以及使用方法
在这里插入图片描述

import CheckFuncPerf as cfp

def findAnagrams(s: str, p: str) -> list[int]:
    list_p = [0] * 26
    list_s = [0] * 26
    list_result = []
    for iIdx in range(len(p)):
        list_p[ord(p[iIdx])-ord('a')] += 1
    for iIdx in range(len(s)):
        list_s[ord(s[iIdx])-ord('a')] += 1
        if iIdx < len(p) - 1:
            continue
        if list_s == list_p:
            list_result.append(iIdx - len(p) + 1)
        list_s[ord(s[iIdx-len(p)+1]) - ord('a')] -= 1
    return list_result

s, p = 'cbaebabacd', 'abc'
result = cfp.getTimeMemoryStr(findAnagrams, s, p)
print(result['msg'],'执行结果={}'.format(result['result']))
# 运行结果
函数 findAnagrams 的运行时间为 0.00 ms；内存使用量为 4.00 KB 执行结果=[0, 6]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

优化版【每次判断是否出现未出现在p中的字符，如出现进行跳跃】，性能自由落体，超越42%
在这里插入图片描述

这种优化有赖于p的特性，p的长度越长，优化效果越好；反之，因为每个字符都要多一次比较，性能反而会下降

def findAnagrams_ext1(s: str, p: str) -> list[int]:
    list_p = [0] * 26
    list_s = [0] * 26
    list_result = []
    for iIdx in range(len(p)):
        list_p[ord(p[iIdx])-ord('a')] += 1
    iIdx, ileft = 0, 0
    while iIdx < len(s):
        if p.find(s[iIdx])<0:
            if iIdx<len(p):
                for jIdx in range(iIdx):
                    list_s[ord(s[jIdx])-ord('a')] = 0
            else:
                for jIdx in range(len(p)):
                    list_s[ord(s[iIdx-jIdx])-ord('a')] = 0
            iIdx += 1
            ileft = iIdx
            continue
        list_s[ord(s[iIdx])-ord('a')] += 1
        if iIdx < len(p) + ileft - 1:
            iIdx += 1
            continue
        if list_s == list_p:
            list_result.append(iIdx - len(p) + 1)
        list_s[ord(s[iIdx-len(p)+1]) - ord('a')] -= 1
        ileft += 1
        iIdx += 1
    return list_result
    
s, p = 'cbaebabacd', 'abc'
result = cfp.getTimeMemoryStr(findAnagrams_ext1, s, p)
print(result['msg'],'执行结果={}'.format(result['result']))
# 运行结果
函数 findAnagrams_ext1 的运行时间为 0.00 ms；内存使用量为 0.00 KB 执行结果=[0, 6]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

计算优化版【标准版中，将ord('a')先计算出来，避免每次计算】，性能优异，超越97%
在这里插入图片描述

def findAnagrams_iorda(s: str, p: str) -> list[int]:
    iOrda = ord('a')
    list_p = [0] * 26
    list_s = [0] * 26
    list_result = []
    for iIdx in range(len(p)):
        list_p[ord(p[iIdx])-iOrda] += 1
    for iIdx in range(len(s)):
        list_s[ord(s[iIdx])-iOrda] += 1
        if iIdx < len(p) - 1:
            continue
        if list_s == list_p:
            list_result.append(iIdx - len(p) + 1)
        list_s[ord(s[iIdx-len(p)+1]) - iOrda] -= 1
    return list_result
    
s, p = 'cbaebabacd', 'abc'
result = cfp.getTimeMemoryStr(findAnagrams_iorda, s, p)
print(result['msg'],'执行结果={}'.format(result['result']))
# 运行结果
函数 findAnagrams_iorda 的运行时间为 0.00 ms；内存使用量为 0.00 KB 执行结果=[0, 6]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

优化加强版【优化版中，将ord('a')先计算出来，避免每次计算】，性能一般，超越54%
在这里插入图片描述

def findAnagrams_ext1_iorda(s: str, p: str) -> list[int]:
    iOrda = ord('a')
    list_p = [0] * 26
    list_s = [0] * 26
    list_result = []
    for iIdx in range(len(p)):
        list_p[ord(p[iIdx])-iOrda] += 1
    iIdx, ileft = 0, 0
    while iIdx < len(s):
        if p.find(s[iIdx])<0:
            if iIdx<len(p):
                for jIdx in range(iIdx):
                    list_s[ord(s[jIdx])-iOrda] = 0
            else:
                for jIdx in range(len(p)):
                    list_s[ord(s[iIdx-jIdx])-iOrda] = 0
            iIdx += 1
            ileft = iIdx
            continue
        list_s[ord(s[iIdx])-iOrda] += 1
        if iIdx < len(p) + ileft - 1:
            iIdx += 1
            continue
        if list_s == list_p:
            list_result.append(iIdx - len(p) + 1)
        list_s[ord(s[iIdx-len(p)+1]) - iOrda] -= 1
        ileft += 1
        iIdx += 1
    return list_result
    
s, p = 'cbaebabacd', 'abc'
result = cfp.getTimeMemoryStr(findAnagrams_ext1_iorda, s, p)
print(result['msg'],'执行结果={}'.format(result['result']))
# 运行结果
函数 findAnagrams_ext1_iorda 的运行时间为 0.00 ms；内存使用量为 0.00 KB 执行结果=[0, 6]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

一日练，一日功，一日不练十日空

may the odds be ever in your favor ~

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/小桥流水78/article/detail/852860