Python3 爬虫之Selenium库的使用_python3 selenium

作者：小蓝xlanll | 2024-04-06 13:40:10

踩

python3 selenium

今天在官网看了下Selenium库，总结了下常用的方法，直接上代码。（沈略环境搭建，网上多得是），新手建议去了解10分钟再来看这里的代码。

这里列举一下常用的查找元素方法：其实find_element_by_xpath是万能的。

单元素定位：

find_element_by_name
find_element_by_id
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

find_element(By.ID,"kw")
find_element(By.NAME,"wd")
find_element(By.CLASS_NAME,"s_ipt")
find_element(By.TAG_NAME,"input")
find_element(By.LINK_TEXT,u"新闻")
find_element(By.PARTIAL_LINK_TEXT,u"新")
find_element(By.XPATH,"//*[@class='bg s_btn']")
find_element(By.CSS_SELECTOR,"span.bg s_btn_wr>input#su")

多元素定位：

find_elements_by_name
find_elements_by_id
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector

返回的是list列表，用print(type(elements_name))即可看到它的类型是list。


from selenium import webdriver
import lxml.html
from selenium.webdriver.common.by import By
import time
from selenium.webdriver import ActionChains
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
 
browser = webdriver.Chrome()
 
 
# 先梳理一下逻辑，在讲下xpath的使用，最后讲下常用方法。
'''
1、导入webdriver模块
2、通过该模块点出一个浏览器对象
3、通过浏览器对象点出连接——browser.get("")
5、通过浏览器对象点出当前页面的html标签内容——browser.page_source
6、通过浏览器对象点出要获取元素的方法来获取html标签——browser.find_element(By.ID,"q").click() or browser.find_element_by_id("q").click()
7、这里重点讲一下xpath的使用，因为其他的都简单，
        import lxml.html
        html1 = "html内容"
        selector = lxml.html.fromstring(html1)
        （1）、没有属性的标签可省略，属性都相同的标签可省略。
        （2）、属性以某字符串开头：xpath('//div[starts-with(@id,"test")]/text()')遍历即可。
        （3）、属性值包含相同字符串：把上面的starts-with改为contains遍历即可。
        
        （4）、获取子标签下的文字：lists_index=selector.xpath('//div[@class="useful"]')。info_list=lists_index[0].xpath('ul/li/text()')输出即可。
        （5）、获取不同标签下的文字：data=selector.xpath('//div[@id="test3"]')[0]。info=data.xpath('string(.)')输出即可。
        （6）、第四句的意思是，获取class为useful的div标签，以列表形式返回，第一个div为div[0]，以此类推；后面那句也是以列表的形式返回文本数据。
               第五句的意思是，获取id为test3的div标签的第一个div；后面那句是返回这个div[0]标签下的所有文本内容。
'''
html1 = '''
<html>
    <head>
        <title>ceshi</title>
    </head>
    <body>
        <div class="useful">
            <ul>
                <li class="info">1</li>
                <li class="info">2</li>
                <li class="info">3</li>
                <li class="inf">4</li>
            </ul>
        </div>
        <div class = "useful">
            <ul>
                <li class="info">5</li>
                <li class="info">6</li>
            </ul>
        </div>
    </body>
</html>
'''
selector = lxml.html.fromstring(html1)
useful = selector.xpath('//div[@class="useful"]')
info_list = useful[0].xpath('ul/li/text()')
print(info_list)
 
 
 
# 打开知乎，滑到最底下，输出一句话
# browser.get("http://www.zhihu.com/explore")
# browser.execute_script('window.scrollTo(0, document.body.scrollHeight)')
# browser.execute_script('alert("To Bottom")')
 
# 打开淘宝，输入ipad，删除后输入MakBook pro，点击搜索
# browser.get("http://www.taobao.com")
# input_str = browser.find_element_by_id('q')
# input_str.send_keys("ipad")
# time.sleep(1)
# input_str.clear()
# input_str.send_keys("MakBook pro")
# button = browser.find_element_by_class_name('btn-search')
# button.click()
 
# 打开一个网址，拖动滑块到吻合的地方
# url = "http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable"
# browser.get(url)
# browser.switch_to.frame('iframeResult')
# source = browser.find_element_by_css_selector('#draggable')
# target = browser.find_element_by_css_selector('#droppable')
# actions = ActionChains(browser)
# actions.drag_and_drop(source, target)
# actions.perform()
 
# 打开网页，获取元素touple，获取属性的值
# url = 'https://www.zhihu.com/explore'
# browser.get(url)
# logo = browser.find_element_by_id('zh-top-link-logo')
# print(logo)
# print(logo.get_attribute('class'))
 
# 获取ID，位置，标签名
# url = 'https://www.zhihu.com/explore'
# browser.get(url)
# input = browser.find_element_by_class_name('zu-top-add-question')
# print(input.id)
# print(input.location)
# print(input.tag_name)
# print(input.size)
 
# 切入到frame中以及切出来
# url = 'http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
# browser.get(url)
# browser.switch_to.frame('iframeResult')  # 切入
# source = browser.find_element_by_css_selector('#draggable')
# print(source)
# try:
#     logo = browser.find_element_by_class_name('logo')
# except NoSuchElementException:
#     print('NO LOGO')
#
# browser.switch_to.parent_frame()  # 切出
# logo = browser.find_element_by_class_name('logo')
# print(logo)
# print(logo.text)
 
# 隐式等待(等10秒钟后还没出现就报错)
# browser.implicitly_wait(10)
# browser.get('https://www.zhihu.com/explore')
# input = browser.find_element_by_class_name('zu-top-add-question')
# print(input)
 
# 显示等待（等待某个元素出现）
# browser.get('https://www.taobao.com/')
# wait = WebDriverWait(browser, 10)
# input = wait.until(EC.presence_of_element_located((By.ID, 'q')))  # 元素是否出现
# button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.btn-search')))  # 元素是否可点击
# print(input, button)
'''
常用的判断条件：
title_is 标题是某内容
title_contains 标题包含某内容
presence_of_element_located 元素加载出，传入定位元组，如(By.ID, 'p')
visibility_of_element_located 元素可见，传入定位元组
visibility_of 可见，传入元素对象
presence_of_all_elements_located 所有元素加载出
text_to_be_present_in_element 某个元素文本包含某文字
text_to_be_present_in_element_value 某个元素值包含某文字
frame_to_be_available_and_switch_to_it frame加载并切换
invisibility_of_element_located 元素不可见
element_to_be_clickable 元素可点击
staleness_of 判断一个元素是否仍在DOM，可判断页面是否已经刷新
element_to_be_selected 元素可选择，传元素对象
element_located_to_be_selected 元素可选择，传入定位元组
element_selection_state_to_be 传入元素对象以及状态，相等返回True，否则返回False
element_located_selection_state_to_be 传入定位元组以及状态，相等返回True，否则返回False
alert_is_present 是否出现Alert
'''
 
# 浏览器的前进和后退
# browser = webdriver.Chrome()
# browser.get('https://www.baidu.com/')
# browser.get('https://www.taobao.com/')
# browser.get('https://www.zhihu.com/explore')
# browser.back()
# time.sleep(1)
# browser.forward()
# browser.close()
 
# cookie操作
# browser.get('https://www.zhihu.com/explore')
# print(browser.get_cookies())  # 得到
# browser.add_cookie({'name': 'name', 'domain': 'www.zhihu.com', 'value': 'zhaofan'})  # 添加
# print(browser.get_cookies())
# browser.delete_all_cookies()  # 删除
# print(browser.get_cookies())
 
# 选项卡的切换
# browser.get('https://www.baidu.com')  # 去百度（卡一）
# browser.execute_script('window.open()')  # 打开新选项卡（卡二）
# print(browser.window_handles)
# browser.switch_to_window(browser.window_handles[1])  # 获得卡二 去淘宝
# browser.get('https://www.taobao.com')
# time.sleep(1)
# browser.switch_to_window(browser.window_handles[0])   # 获得卡一 去知乎
# browser.get('https://www.zhihu.com/explore')
 
# 异常处理
# 这里的异常比较复杂，官网的参考地址：
# http://selenium-python.readthedocs.io/api.html#module-selenium.common.exceptions
 
# 超时、没找到元素异常处理
# try:
#     browser.get('https://www.baidu.com')
# except TimeoutException:
#     print('Time Out')
# try:
#     browser.find_element_by_id('hello')
# except NoSuchElementException:
#     print('No Element')
# finally:
#     browser.close()

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/小蓝xlanll/article/detail/372244