使用Selenium和Chrome浏览器获取图片网站搜索结果_selenium google search

作者：代码探险家 | 2024-06-29 04:20:31

踩

selenium google search

在进行信息检索时，我们通常会使用搜索引擎来查找相关的文章、图片、音乐等资源。而在特定的领域中，也有一些针对特定内容的搜索引擎或网站，比如百度图片、Pixabay等。在本篇博客中，我们将介绍如何使用Python的Selenium自动化测试工具和Chrome浏览器来获取多个图片网站中关于指定关键字的搜索结果，并返回其URL地址。

安装Selenium和Chrome浏览器

在使用Selenium和Chrome浏览器前，我们需要先安装它们。可以通过pip命令来安装Selenium：

pip install selenium
1

然后，需要下载并安装Chrome浏览器，在安装完成后，还需要下载对应版本的Chrome驱动程序才能配合Selenium使用。具体步骤可以参考Selenium官方文档。

模拟用户操作获取搜索结果

在安装好Selenium和Chrome浏览器之后，我们就可以使用以下代码来实现获取多个图片网站中关于指定关键字的搜索结果，并返回其URL地址。

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time


def get_word_pic_url(key_word):
    url1 = 'https://www.allhistory.com/painting'
    url2 = 'https://pixabay.com/zh/'

    s = Service(r'C:\Users\addoi\AppData\Local\Google\Chrome\Application\chromedriver.exe')

    options = webdriver.ChromeOptions()
    driver = webdriver.Chrome(service=s, options=options)

    driver.get(url1)
    
    # 在搜索框自动输入内容并点击搜索按钮
    time.sleep(2)
    input_box = driver.find_elements(By.XPATH, '//*[@id="header-search-box"]/input')[0]
    input_box.send_keys(key_word)
    time.sleep(3)
    search_button = driver.find_elements(By.XPATH, '//*[@id="header-search-box"]/div[2]/div')[0]
    search_button.click()
    time.sleep(10)

    # 获取当前页面的URL地址，并将其与第二个网站的URL地址拼接成一个字符串
    urls = driver.current_url + "\r\n" + url2

    # 打开第二个图片网站，输入关键字并点击搜索按钮
    # driver.get(url2)
    # time.sleep(2)
    # input_box = driver.find_elements(By.XPATH, '//*[@id="header-search-box"]/input')[0]
    # input_box.send_keys(key_word)
    # time.sleep(3)
    # search_button = driver.find_elements(By.XPATH, '//*[@id="header-search-box"]/div[2]/div')[0]
    # search_button.click()
    # time.sleep(10)
    #
    # urls = urls + driver.current_url
    # driver.close()

    print(f'爬取关键字 {key_word} 的url地址是 {driver.current_url}')
    return urls
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

其中，get_word_pic_url()函数接受一个参数key_word，表示需要搜索的关键词。在函数内部，首先定义了两个图片网站的URL地址url1和url2，然后使用Selenium启动Chrome浏览器，并打开第一个网站url1。通过使用find_elements()方法和XPath表达式，定位到搜索框和搜索按钮，并使用send_keys()方法向搜索框自动填充关键词，使用click()方法模拟用户点击搜索按钮。

然后，通过使用current_url属性获取当前页面的URL地址，并将其与第二个图片网站的URL地址url2进行拼接，形成一个包含两个网站URL地址的字符串并返回。如果需要使用第二个图片网站的搜索结果，可以取消对应代码的注释。

需要注意的是，该函数在运行前需要将Chrome驱动程序放置在指定位置（这里是C:\Users\addoi\AppData\Local\Google\Chrome\Application\chromedriver.exe）。此外，还需要导入以下库：

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import time
1
2
3
4

其中，time库用于添加延时操作，以等待页面加载完成。

完整代码

以下是一个使用Selenium和Chrome浏览器获取图片网站搜索结果的完整示例代码：

if __name__ == "__main__":
    key_word = "夕阳"
    urls = get_word_pic_url(key_word)
    print("搜索结果URL地址：\n", urls)
1
2
3
4

在示例代码中，首先定义了一个关键词key_word，然后调用get_word_pic_url()函数获取该关键词在两个图片网站中的搜索结果URL地址，并将结果打印输出。可以根据需要更改关键词。

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/代码探险家/article/detail/768235