赞
踩
目录
match.group(default=0):返回匹配的字符串。
re.sub('匹配正则','替换内容','string')
1.数据挖掘:从一大推文本中找到一小堆文本。
2.验证:使用正则确认获得的数据是否是期望值。eg:用户名是否合法等
注意:非必要时慎用正则,有更简单的方法就不使用正则
指定一个匹配规则,来识别该规则是否在一个更大的文本字符串中。eg:grep "xxx" 文件
正则表达式可以识别匹配规则的文本是否存在,还能将一个规则分解为一个或多个子规则,并展示每个子规则匹配的文本。
- >>> import re
- >>> re.search("sanchuang","hello world this is sanchuang")
- <_sre.SRE_Match object; span=(20, 29), match='sanchuang'>
- >>> re.search("sanchuang","hello world this is")
- >>> result = re.search("sanchuang","hello world this is")
- >>> print(result)
- None
- >>> re.search("sanchuang","hello world sanchuang this is")
- <_sre.SRE_Match object; span=(12, 21), match='sanchuang'>
- >>> re.search("sanchuang","hello world,sanchuang this is sanchuang")
- <_sre.SRE_Match object; span=(12, 21), match='sanchuang'>
- >>>
- >>> re.search(r"\\\\tsanle","hello\\\\tsanle")
- <_sre.SRE_Match object; span=(5, 13), match='\\\\tsanle'>
- >>> re.search("\\\\tsanle","hello\\\\tsanle")
- <_sre.SRE_Match object; span=(6, 13), match='\\tsanle'>
- >>> re.search("\\tsanle","hello\\\tsanle")
- <_sre.SRE_Match object; span=(6, 12), match='\tsanle'>
- >>>
- >>> msg = "It's raining cats and dogs"
- >>> match = re.search(r"cats",msg)
- >>> match.start()
- 13
- >>> match.end()
- 17
- >>>
- >>> import re
- >>> re.match("sanchuang","sanchuang hello world this is")
- <_sre.SRE_Match object; span=(0, 9), match='sanchuang'>
- >>> result1 = re.match("sanchuang","hello world this is ")
- >>> print(result1)
- None
- >>> result2 = re.match("sanchuang","hello world,sanchuang this is sanchuang")
- >>> print(result2)
- None
- >>>>>> match.groups()
- ()
- >>> match = re.search(r"(cats)",msg)
- >>> match.groups()
- ('cats',)
- >>>
- msg = "It's raining cats and dogs, cats1 cats2"
- result = re.findall("cats",msg)
- print(result)
- result2 = re.finditer("cats",msg)
- print(result2)
- for i in result2:
- print(i.group())
-
- 结果:
- ['cats', 'cats', 'cats']
- <callable_iterator object at 0x000002267C80B820>
- cats
- cats
- cats
-
-
- msg = "It's raining cats and dogs, cats1 cats2"
- result3 = re.finditer("cats",msg)
- print(list(result3))
-
- 结果:
- [<re.Match object; span=(13, 17), match='cats'>, <re.Match object; span=(28, 32), match='cats'>, <re.Match object; span=(34, 38), match='cats'>]
- msg = "I am learning python"
- print(re.sub("python","PYTHON",msg))
-
- 结果:
- I am learning PYTHON
- msg = "I am learning python"
- msg2 = "I am learning Enligsh"
- msg3 = "hello world"
- reg = re.compile("python")
- print(reg.findall(msg))
- print(reg.findall(msg2))
- print(reg.findall(msg3))
- print(re.findall("python",msg))
编译正则的特点:
- ret = re.findall("python","Pyhton 3 python")
- print(ret)
- ret1 = re.findall("[Ppfg]ython","Python 3 python fython Fython")
- print(ret1)
- ret2 = re.findall("[a-zA-Z\-]","abcABC-123-")
- print(ret2)
-
- 结果:
- ['python']
- ['Python', 'python', 'fython']
- ['a', 'b', 'c', 'A', 'B', 'C', '-', '-']
- msg = "It's raining cats and dogs"
- ret = re.search("cats|dogs",msg)
- print(ret.group())
- ret1 = re.findall("cats|dogs",msg)
- print(ret1)
-
- 结果:
- cats
- ['cats', 'dogs']
#re.search 查找匹配第一个 #re.findall 查找匹配全部
- ret = re.findall("[0-z]","lab3cb3ala#>=?!aB")
- print(ret)
- ret1 = re.findall("[^0-9A-Za-z]","lab3cb3al#>=?!aB")
- print(ret1)
- ret2 = re.findall("a[^a-z]","lab3cb3al#>=?!aB")
- print(ret2)
-
- 结果:
- ['l', 'a', 'b', '3', 'c', 'b', '3', 'a', 'l', 'a', '>', '=', 'a', 'B']
- ['#', '>', '=', '?', '!']
- ['aB']
- ret = re.findall("p.thon","python pYTHON Python pYthon Pthon p=thon")
- print(ret)
- ret = re.findall("p.thon","python pYTHON Python pYthon Pthon p thon p\nthon")
- print(ret)
-
- 结果:
- ['python', 'pYthon', 'p=thon']
- ['python', 'pYthon', 'p thon']
快捷
标识
|
功能
|
---|---|
\A
|
匹配字符串开始
|
\bword\b
|
词边界
|
\w
|
匹配包括下划线的任何单词字符。等价于'[A-Za-z0-9_]'
|
\W
|
匹配任何非单词字符。等价于 '[^A-Za-z0-9_]'
|
\d
|
匹配一个数字字符。等价于 [0-9]
|
\D
|
匹配一个非数字字符。等价于 [^0-9]
|
\s
|
匹配任何空白字符,包括空格、制表符、换页符等等。等价于 [ \f\n\r\t\v]
|
\S
|
匹配任何非空白字符。等价于 [^ \f\n\r\t\v]
|
例子如下:(使用快捷键得加“r”)
- ## \bword\b ## ---数字、字符、下划线不算做边界
- ret = re.finditer(r"\bworld","hello world 123world =world world123 ##world## abcworldabc")
- print(list(ret))
- ret1 = re.finditer(r"world\b","hello world 123world =world world123 ##world## abcworldabc")
- print(list(ret1))
- ret2 = re.finditer(r"\bworld\b","hello world 123world =world world123 ##world## abcworldabc")
- print(list(ret2))
-
- 结果:
- [<re.Match object; span=(6, 11), match='world'>, <re.Match object; span=(22, 27), match='world'>, <re.Match object; span=(29, 34), match='world'>, <re.Match object; span=(40, 45), match='world'>]
- [<re.Match object; span=(6, 11), match='world'>, <re.Match object; span=(15, 20), match='world'>, <re.Match object; span=(22, 27), match='world'>, <re.Match object; span=(40, 45), match='world'>]
- [<re.Match object; span=(6, 11), match='world'>, <re.Match object; span=(22, 27), match='world'>, <re.Match object; span=(40, 45), match='world'>]
\B 匹配一个前后都无单词边界的字符串
- ret = re.finditer(r"\Bworld\B","hello _world world123 123world =world ##world## abcworldabc")
- print(list(ret))
-
- 结果:
- [<re.Match object; span=(51, 56), match='world'>]
- # \w \W
- ret = re.findall(r'\w',"python3#")
- print(ret)
- ret = re.findall(r'\W',"python3#")
- print(ret)
-
- 结果:
- ['p', 'y', 't', 'h', 'o', 'n', '3']
- ['#']
- ret = re.findall("^python","hello python")
- print(ret)
- ret1 = re.findall("^python","python123#")
- print(ret1)
- ret2 = re.findall("python$","hello python")
- print(ret2)
- ret3 = re.findall("^python$","hello python")
- print(ret3)
-
- 结果:
- []
- ['python']
- ['python']
- []
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。