赞
踩
在读写文件时,打开文件,使用完毕后要正确的关闭它,一种方式是使用try...finally
,另一种更方便的方式是使用with open(filename, 'r') as f:
实际上,任何对象,只要正确实现了上下文管理,就可以用with
语句。实现上下文管理通过__enter__
和__exit__
这个两个方法来实现
class Query: def __init__(self, name): self.name = name def __enter__(self): print('Begin') return self def __exit__(self, exc_type, exc_value, traceback): if exc_type: print('Error') else: print('End') def query(self): print('Query info about %s...' % self.name) with Query('Bob') as q: q.query()
Begin
Query info about Bob...
End
编写__enter__
和__exit__
依然繁琐,contextlib
提供了更简单的写法:
from contextlib import contextmanager class Query: def __init__(self, name): self.name = name def query(self): print('Query info about %s...' % self.name) @contextmanager def create_query(name): print('Begin') q = Query(name) yield q print('End') with create_query('Bob') as q: q.query()
Begin
Query info about Bob...
End
@contextmanager
这个decorator接受一个generator,用yield
语句把with ... as var
把变量输出出去,然后,with
语句就可以正常地工作。
很多时候,希望在某段代码执行前后自动执行特定代码,也可以用@contextmanager
来实现
@contextmanager
def tag(name):
print("<%s>" % name)
yield
print("</%s>" % name)
with tag("h1"):
print("hello")
print("world")
<h1>
hello
world
</h1>
代码的执行顺序是:
with
语句首先执行yield
之前的语句,打印<h1>
yield
调用会执行with
语句内部的所有语句,打印出hello
和world
yield
之后的语句,打印出</h1>
from contextlib import closing
from urllib.request import urlopen
@contextmanager
def closing(thing):
try:
yield thing
finally:
thing.close()
with closing(urlopen('https://www.python.org')) as page:
for line in page:
pass # print(line) will print every line this website page.
urllib
的request
模块可以方便的抓取URL内容,发送一个GET请求到指定的页面,然后返回HTTP的响应
from urllib import request
with request.urlopen('https://www.python.org') as f:
data = f.read()
print('Status:',f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
# print('Data', data.decode('utf-8'))
Status: 200 OK Server: nginx Content-Type: text/html; charset=utf-8 X-Frame-Options: DENY Via: 1.1 vegur Via: 1.1 varnish Content-Length: 49123 Accept-Ranges: bytes Date: Tue, 05 May 2020 18:46:26 GMT Via: 1.1 varnish Age: 769 Connection: close X-Served-By: cache-bwi5122-BWI, cache-mdw17348-MDW X-Cache: HIT, HIT X-Cache-Hits: 4, 2 X-Timer: S1588704387.816773,VS0,VE0 Vary: Cookie Strict-Transport-Security: max-age=63072000; includeSubDomains
如果想要模拟浏览器发送GET请求,就需要使用Request
对象,通过往Request
对象添加HTTP头,我们就可以把请求伪装成浏览器。例如,模拟iPhone 6去请求豆瓣首页
from urllib import request
req = request.Request('http://www.douban.com/')
req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25')
with request.urlopen(req) as f:
print('Status:', f.status, f.reason)
for k, v in f.getheaders():
print('%s: %s' % (k, v))
print('Data:', f.read().decode('utf-8'))
Status: 200 OK Date: Tue, 05 May 2020 18:49:12 GMT Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: close Vary: Accept-Encoding Vary: Accept-Encoding X-Xss-Protection: 1; mode=block X-Douban-Mobileapp: 0 Expires: Sun, 1 Jan 2006 01:00:00 GMT Pragma: no-cache Cache-Control: must-revalidate, no-cache, private Set-Cookie: bid=0uXd--_Fj8w; Expires=Wed, 05-May-21 18:49:12 GMT; Domain=.douban.com; Path=/ X-DOUBAN-NEWBID: 0uXd--_Fj8w X-DAE-App: talion X-DAE-Instance: default Server: dae Strict-Transport-Security: max-age=15552000 X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN Data: <!DOCTYPE html> <html itemscope itemtype="http://schema.org/WebPage" class="ua-safari ua-mobile "> <head> <meta charset="UTF-8"> <title>豆瓣(手机版)</title> <meta name="google-site-verification" content="ok0wCgT20tBBgo9_zat2iAcimtN4Ftf5ccsh092Xeyw" /> <meta name="viewport" content="width=device-width, height=device-height, user-scalable=no, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0"> ...
如果要以POST发送一个请求,只需要把参数data
以bytes形式传入。
模拟一个微博登录,先读取登录的邮箱和口令,然后按照weibo.cn的登录页格式以username-xxx&password=xxx
的编码传入
from urllib import request, parse print('Login to weibo.cn') email = input('Email: ') passwd = input('Password: ') # Encode a dict or sequence of two-element tuples into a URL query string. login_data = parse.urlencode([ ('username', email), ('password', passwd), ('entry', 'mweibo'), ('client_id', ''), ('savestate', 1), ('ec', ''), ('pagerefer', 'https://passport.weibo.cn/signin/welcome?entry=mweibo&r=http%3A%2F%2Fm.weibo.cn%2F') ]) req = request.Request('https://passport.weibo.cn/sso/login') req.add_header('Origin', 'https://passport.weibo.cn') req.add_header('User-Agent', 'Mozilla/6.0 (iPhone; CPU iPhone OS 8_0 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Version/8.0 Mobile/10A5376e Safari/8536.25') req.add_header('Referer', 'https://passport.weibo.cn/signin/login?entry=mweibo&res=wel&wm=3349&r=http%3A%2F%2Fm.weibo.cn%2F') with request.urlopen(req, data=login_data.encode('utf-8')) as f: print('Status:', f.status, f.reason) for k, v in f.getheaders(): print('%s: %s' % (k, v)) print('Data:', f.read().decode('utf-8'))
Login to weibo.cn
Status: 200 OK
Server: nginx/1.6.1
Date: Tue, 05 May 2020 18:57:36 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding
Cache-Control: no-cache, must-revalidate
Expires: Sat, 26 Jul 1997 05:00:00 GMT
Pragma: no-cache
Access-Control-Allow-Origin: https://passport.weibo.cn
Access-Control-Allow-Credentials: true
DPOOL_HEADER: localhost.localdomain
Data: {"retcode":50011007,"msg":"\u8bf7\u8f93\u5165\u7528\u6237\u540d","data":{"errline":320}}
登录失败
如果还需要更复杂的控制,比如通过一个Proxy去访问网站,我们需要利用ProxyHandler
来处理
urilib提供的功能就是利用程序去执行各种HTTP请求。如果要模拟浏览器完成特定功能,需要把请求伪装成浏览器。伪装的方法就是先监控浏览器发出的请求,然后根据浏览器的请求头来伪装,User-Agent
头就是用来标识浏览器的。
from xml.parsers.expat import ParserCreate class DefaultSaxHandler(object): def start_element(self, name, attrs): print('sax:start_element: %s, attrs: %s' % (name, str(attrs))) def end_element(self, name): print('sax:end_element: %s' % name) def char_data(self, text): print('sax:char_data: %s' % text) xml = r'''<?xml version="1.0"?> <ol> <li><a href="/python">Python</a></li> <li><a href="/ruby">Ruby</a></li> </ol> ''' handler = DefaultSaxHandler() parser = ParserCreate() parser.StartElementHandler = handler.start_element parser.EndElementHandler = handler.end_element parser.CharacterDataHandler = handler.char_data parser.Parse(xml)
sax:start_element: ol, attrs: {} sax:char_data: sax:char_data: sax:start_element: li, attrs: {} sax:start_element: a, attrs: {'href': '/python'} sax:char_data: Python sax:end_element: a sax:end_element: li sax:char_data: sax:char_data: sax:start_element: li, attrs: {} sax:start_element: a, attrs: {'href': '/ruby'} sax:char_data: Ruby sax:end_element: a sax:end_element: li sax:char_data: sax:end_element: ol
from html.parser import HTMLParser from html.entities import name2codepoint class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): print('<%s>' % tag) def handle_endtag(self, tag): print('</%s>' % tag) def handle_startendtag(self, tag, attrs): print('<%s/>' % tag) def handle_data(self, data): print(data) def handle_comment(self, data): print('<!--', data, '-->') def handle_entityref(self, name): print('&%s;' % name) def handle_charref(self, name): print('&#%s;' % name) parser = MyHTMLParser() parser.feed('''<html> <head></head> <body> <!-- test html parser --> <p>Some <a href=\"#\">html</a> HTML tutorial...<br>END</p> </body></html>''')
<html> <head> </head> <body> <!-- test html parser --> <p> Some <a> html </a> HTML tutorial... <br> END </p> </body> </html>
找一个网页,例如https://www.python.org/events/python-events/,
用浏览器查看源码并复制,然后尝试解析一下HTML,输出Python官网发布的会议时间、名称和地点。
from html.parser import HTMLParser from html.entities import name2codepoint from urllib import request from urllib.request import urlopen class EventHTMLParser(HTMLParser): def __init__(self): super().__init__() self.tag = None def handle_starttag(self, tag, attrs): if ('class', 'event-title') in attrs: self.tag = 'Event-title' if tag == 'time': self.tag = 'Time' if ('class', 'say-no-more') in attrs: self.tag = 'Year' elif ('class', 'event-location') in attrs: self.tag = 'Event-location' def handle_data(self, data): if self.tag: print(self.tag, data) def handle_endtag(self, data): if self.tag: self.tag = None with urlopen('https://www.python.org/events/python-events') as f: html_data = str(f.read()) parser = EventHTMLParser() parser.feed(html_data)
Event-title PyConWeb 2020 (canceled) Time 09 May – 10 May Year 2020 Event-location Munich, Germany Event-title Django Girls Groningen Time 16 May Year 2020 Event-location Groningen, Netherlands Event-title PyLondinium 2020 (postponed) Time 05 June – 07 June Year 2020 Event-location London, UK Event-title PyCon CZ 2020 (canceled) Time 05 June – 07 June Year 2020 Event-location Ostrava, Czech Republic Event-title PyCon Odessa 2020 Time 13 June – 14 June Year 2020 Event-location Odessa, Ukraine Event-title Python Web Conference 2020 (Online-Worldwide) Time 17 June – 19 June Year 2020 Event-location https://2020.pythonwebconf.com Event-title Python Meeting D\xc3\xbcsseldorf Time 01 July Year 2020 Event-location D\xc3\xbcsseldorf, Germany Event-title SciPy 2020 Time 06 July – 12 July Year 2020 Event-location Online Event-title Python Nordeste 2020 Time 17 July – 19 July Year 2020 Event-location Fortaleza, Cear\xc3\xa1, Brasil Event-title EuroPython 2020 (in-person: canceled, considering going virtual) Time 20 July – 26 July Year 2020 Event-location https://blog.europython.eu/post/612826526375919616/europython-2020-going-virtual-europython-2021 Event-title EuroPython 2020 Online Time 23 July – 26 July Year 2020 Event-location Online Event Event-title EuroSciPy 2020 (canceled) Time 27 July – 31 July Year 2020 Event-location Bilbao, Spain Event-title PyCon JP 2020 Time 28 Aug. – 29 Aug. Year 2020 Event-location Tokyo, Japan Event-title PyCon TW 2020 Time 05 Sept. – 06 Sept. Year 2020 Event-location International Conference Hall ,No.1, University Road, Tainan City 701, Taiwan Event-title PyCon SK 2020 Time 11 Sept. – 13 Sept. Year 2020 Event-location Bratislava, Slovakia Event-title DjangoCon Europe 2020 Time 16 Sept. – 20 Sept. Year 2020 Event-location Porto, Portugal Event-title DragonPy 2020 Time 19 Sept. – 20 Sept. Year 2020 Event-location Ljubljana, Slovenia Event-title PyCon APAC 2020 Time 19 Sept. – 20 Sept. Year 2020 Event-location Kota Kinabalu, Sabah, Malaysia Event-title Django Day Copenhagen Time 25 Sept. Year 2020 Event-location Copenhagen, Denmark Event-title PyCon Turkey Time 26 Sept. – 27 Sept. Year 2020 Event-location Albert Long Hall, at Bogazici University Istanbul Event-title Python Meeting D\xc3\xbcsseldorf Time 30 Sept. Year 2020 Event-location D\xc3\xbcsseldorf, Germany Event-title PyCon India 2020 Time 02 Oct. – 05 Oct. Year 2020 Event-location Bangalore, India Event-title PyConDE & PyData Berlin 2020 Time 14 Oct. – 16 Oct. Year 2020 Event-location Berlin, Germany Event-title Swiss Python Summit Time 23 Oct. Year 2020 Event-location Rapperswil, Switzerland Event-title PyCC Meetup'19 (Python Cape Coast User Group) Time 26 Oct. Year 2020 Event-location Cape coast, Ghana Event-title Python Brasil 2020 Time 28 Oct. – 02 Nov. Year 2020 Event-location Caxias do Sul, RS, Brazil Event-title PyData London 2020 Time 30 Oct. – 01 Nov. Year 2020 Event-location London, UK Event-title PyCon Italia 2020 Time 05 Nov. – 08 Nov. Year 2020 Event-location Florence, Italy Event-title enterPy Time 23 Nov. – 24 Nov. Year 2020 Event-location Mannheim, Germany Event-title PyCon US 2021 Time 12 May – 20 May Year 2021 Event-location Pittsburgh, PA, USA Event-title SciPy 2021 Time 12 July – 18 July Year 2021 Event-location Austin, TX, US Event-title EuroPython 2021 Time 26 July – 01 Aug. Year 2021 Event-location Dublin, Ireland Year General Year Initiatives
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。