赞
踩
用实现定义好的字符串组成“规则字符串”,对文本进行过滤
1. Basic Patterns:
search() 返回object,且是找到的第一个例子
- import re
-
- text='Today is a nice day, and today is Monday.'
- pattern='today'
- match=re.search(pattern,text) #search()返回一个object
<re.Match object; span=(56, 62), match='London'>
findall() 找到所有符合对象,返回str
finditer() 返回object
- for match in re.finditer(pattern,text):
- print(match) #return object
- print(match.span()) #return index
group() 返回actual text that matched
2. Simple Patterns:
[] Disjunction
pattern='[Tt]oday' #Today or today
[A-Z] | an upper case letter |
[a-z] | an lower case letter |
[0-9] | a single digit |
[b-g] | b,c,d,e,f or g |
^ NOT 和 $
注:^主要有两个用处
a) 在[]开头表示否定(注意要在[]里面)
pattern='[^a-z]' #非小写的所有字符串
b) match the start of the line
-
- text='Today is happy day. Today is Monday.'
- pattern1='^The' # Start of Line match
- pattern2='Monday$' # End of Line match
- re.findall(pattern1,text) #return 第一个Today
- re.findall(pattern2,text) #return Monday
-
?Question mark
- # match one or zero optional character
- text='apple and apples'
- pattern='apples?' #?是跟着前面的char的
- re.findall(pattern,text) #return 'apple' and 'apples' ?是跟着前面的char的
* Kleene
- # match zero or many optional characters
- text='b and baaaa and ba'
- pattern='ba*' #*跟着前面的char,所以表示没有a或者后面很多重复a
- re.findall(pattern,text) #return 'b' 'baaaa' 'ba'
+ Kleene
- # match one or many optional characters
- text='b and baaaa and ba'
- pattern='ba*' #*跟着前面的char,所以表示有一个a或者后面很多重复a
- re.findall(pattern,text) #return 'baaaa' 'ba'
. Wild Card
- # matches any single character
- text = 'care and core'
- pattern= 'c.re'
- re.findall(pattern,text) #return 'care' 和'core'
| OR
- re.search('apple|orange','I like apple')
- re.search('cat(fish|nap|claw)','I like catfish and catnap')
\b
- text='the and other'
- pattern=r'\bthe\b' # Word Boundary
- re.findall(pattern,text) #只返回the
3. Identifiers for Characters
\d | 数字 |
\w | 字母/数字 |
\s | 空格 |
\D | 非数字 |
\W | 非字母数字,即符号 |
\S | 非空格 |
对于一些重复出现的字符,有特殊量词指代符号
+ | 出现一次或更多 |
{3} | 出现正好3次 |
{2,4} | 出现2-4次 |
{3, } | 出现3次或更多 |
\* | 出现0次或更多 |
? | 出现0次或者1次 |
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。