赞
踩
对于网页的html标签要善于发现标签的一些特定写法有助于爬取正则表达式的书写,尤其一些独有的标签
列入 class、id、src等
# coding: utf-8 import re string = '<cc><div id="post_content_115101375872" class="d_post_content j_d_post_content ">秋高气爽<br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=16a7318cd2f9d72a17641015e42b282a/353680cb39dbb6fd5e8db0950224ab18952b379e.jpg" size="65387" changedsize="true" width="560" height="420" size="65387"><br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=80df3901b9b7d0a27bc90495fbee760d/1ddd0b55b319ebc49aae232a8926cffc1c17169e.jpg" size="50323" changedsize="true" width="560" height="420" size="50323"><br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=078b542b55df8db1bc2e7c6c3922dddb/33c010dfa9ec8a1338cc0253fc03918fa2ecc09f.jpg" size="78770" changedsize="true" width="560" height="373" size="78770"><br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=f4602793b212c8fcb4f3f6c5cc0292b4/b0c62934349b033bdd5ea9691ece36d3d739bd9f.jpg" size="95035" changedsize="true" width="560" height="373" size="95035"><br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=1f8f63cc73f0f736d8fe4c093a54b382/300b37d12f2eb9382fbb2f41de628535e7dd6f9f.jpg" size="100285" changedsize="true" width="560" height="373" size="100285"><br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=cf96d68331f33a879e6d0012f65d1018/6b5f0e2442a7d933d23ab1c0a64bd11371f001da.jpg" size="65247" changedsize="true" width="560" height="420" size="65247"><br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=dfff2f41de62853592e0d229a0ee76f2/4a9f8ad4b31c8701e524e7552c7f9e2f0508ffdb.jpg" size="79750" changedsize="true" width="560" height="414" size="79750"><br><img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=b4a6d35d9782d158bb8259b9b00b19d5/35d1279759ee3d6d3054d27848166d224d4adedb.jpg" size="103175" changedsize="true" width="560" height="414" size="103175"></div><br></cc><a href="https://www.baidu.com"></a>' # 1.构造正则表达式 pattern = re.compile(r'<img class="BDE_I.*?src="(.*?)".*?size="(.*?)".*?width="(.*?)".*?height="(.*?)"') # 2.findall()查找所有符合规则的字符串 rs = re.findall(pattern, string) for detail in rs : print '图片链接:%s'%detail[0] print '图片大小:%s'%detail[1] print '图片宽度:%s'%detail[2] print '图片高度:%s'%detail[3] print '**********************************************'
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。