Puppeteer is a Node.js library which provides a high-level API to control Chrome/Chromium over the DevTools Protocol.Puppeteer是一个Node.js库,它提供了一个高级API来通过DevTools协议控制Chrome/Chromium。Puppeteer runs in headless mode by default, but can be configured to run in full (non-headless) Chrome/Chromium.Puppeteer默认在无头模式下运行,但可以配置为在完全(无头)Chrome/Chromium下运行。
Most things that you can do manually in the browser can be done using Puppeteer! 您可以在浏览器中手动执行的大多数操作都可以使用Puppeteer完成!Here are a few examples to get you started:以下是一些示例,可以帮助您开始:
To use Puppeteer in your project, run:要在项目中使用Puppeteer,请运行:
npm i puppeteer
# or `yarn add puppeteer`
# or `pnpm i puppeteer`
When you install Puppeteer, it automatically downloads a recent version of Chromium (~170MB macOS, ~282MB Linux, ~280MB Windows) that is guaranteed to work with Puppeteer. 当您安装Puppeteer时,它会自动下载最新版本的Chromium(170MB macOS,282MB Linux,280MB Windows),该版本保证可以与Puppeter一起使用。For a version of Puppeteer without installation, see puppeteer-core.对于没有安装的木偶版本,请参阅puppeteer-core。
Puppeteer uses several defaults that can be customized through configuration files.Puppeteer使用了几个可以通过配置文件定制的默认值。
For example, to change the default cache directory Puppeteer uses to install browsers, you can add a 例如,要更改Puppeteer用于安装浏览器的默认缓存目录,可以在应用程序的根目录添加一个.puppeteerrc.cjs
(or puppeteer.config.cjs
) at the root of your application with the contents.puppeteerrc.cjs
(或puppeteer.config.cjs
)
const {join} = require('path');
/**
* @type {import("puppeteer").Configuration}
*/
module.exports = {
// Changes the cache location for Puppeteer.
cacheDirectory: join(__dirname, '.cache', 'puppeteer'),
};
After adding the configuration file, you will need to remove and reinstall 添加配置文件后,您需要删除并重新安装puppeteer
for it to take effect.puppeteer
才能使其生效。
See Configuring Puppeteer for more information.有关详细信息,请参阅配置Puppeteer。
puppeteer-core
Every release since v1.7.0 we publish two packages:自v1.7.0以来的每个版本,我们都发布两个包:
puppeteer
is a product for browser automation. puppeteer
是一款用于浏览器自动化的产品。When installed, it downloads a version of Chromium, which it then drives using 安装后,它会下载一个版本的Chromium,然后使用puppeteer-core
. puppeteer-core
驱动它。Being an end-user product, 作为最终用户产品,puppeteer
automates several workflows using reasonable defaults that can be customized.puppeteer
使用可定制的合理默认值自动执行多个工作流。
puppeteer-core
is a library to help drive anything that supports DevTools protocol. 是一个帮助驱动任何支持DevTools协议的库。Being a library, 作为一个库,puppeteer-core
is fully driven through its programmatic interface implying no defaults are assumed and puppeteer-core
will not download Chromium when installed.puppeteer-core
完全通过其编程接口驱动,这意味着没有默认设置,puppeteer-core
在安装时不会下载Chromium。
You should use 如果您正在连接到远程浏览器或自己管理浏览器,则应使用puppeteer-core
if you are connecting to a remote browser or managing browsers yourself. puppeteer-core
。If you are managing browsers yourself, you will need to call puppeteer.launch with an an explicit executablePath (or channel if it's installed in a standard location).如果您自己管理浏览器,则需要使用显式executablePath(或channel,如果它安装在标准位置)调用puppeteer.launch。
When using 使用puppeteer-core
, remember to change the import:puppeteer-core
时,请记住更改导入:
import puppeteer from 'puppeteer-core';
Puppeteer follows the latest maintenance LTS version of Node.Puppeteer遵循Node的最新维护LTS版本。
Puppeteer will be familiar to people using other browser testing frameworks. 使用其他浏览器测试框架的人将熟悉Puppeter。You launch/connect a browser, create some pages, and then manipulate them with Puppeteer's API.您启动/连接浏览器,创建一些页面,然后使用Puppeteer的API对其进行操作。
For more in-depth usage, check our guides and examples.有关更深入的用法,请查看指南和示例。
The following example searches developers.google.com/web for articles tagged "Headless Chrome" and scrape results from the results page.下面的示例在developers.google.com/web上搜索标记为“Headless Chrome”的文章,并从结果页面中获取结果。
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://developers.google.com/web/');
// Type into search box.在搜索框中键入。
await page.type('.devsite-search-field', 'Headless Chrome');
// Wait for suggest overlay to appear and click "show all results".等待建议覆盖显示,然后单击“显示所有结果”。
const allResultsSelector = '.devsite-suggest-all-results';
await page.waitForSelector(allResultsSelector);
await page.click(allResultsSelector);
// Wait for the results page to load and display the results.等待结果页面加载并显示结果。
const resultsSelector = '.gsc-results .gs-title';
await page.waitForSelector(resultsSelector);
// Extract the results from the page.从页面中提取结果。
const links = await page.evaluate(resultsSelector => {
return [...document.querySelectorAll(resultsSelector)].map(anchor => {
const title = anchor.textContent.split('|')[0].trim();
return `${title} - ${anchor.href}`;
});
}, resultsSelector);
// Print all the files.打印所有文件。
console.log(links.join('\n'));
await browser.close();
})();
1. Uses Headless mode使用无头模式
Puppeteer launches Chromium in headless mode. Puppeteer以无头模式启动Chromium。To launch a full version of Chromium, set the headless option when launching a browser:要启动完整版本的Chromium,请在启动浏览器时设置headless
选项:
const browser = await puppeteer.launch({headless: false}); // default is true
2. Runs a bundled version of Chromium运行Chromium的捆绑版本
By default, Puppeteer downloads and uses a specific version of Chromium so its API is guaranteed to work out of the box. 默认情况下,Puppeteer下载并使用特定版本的Chromium,因此其API保证可以开箱即用。To use Puppeteer with a different version of Chrome or Chromium, pass in the executable's path when creating a 要将Puppeteer与不同版本的Chrome或Chromium一起使用,请在创建Browser
instance:Browser
实例时传入可执行文件的路径:
const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});
You can also use Puppeteer with Firefox Nightly (experimental support). 您还可以将Puppeteer与Firefox Nightly一起使用(实验支持)。See Puppeteer.launch for more information.有关详细信息,请参阅Puppeteer.launch。
See this article for a description of the differences between Chromium and Chrome. 请参阅本文了解Chromium和Chrome之间的区别。This article这篇文章 describes some differences for Linux users.描述了Linux用户的一些差异。
3. Creates a fresh user profile创建新的用户配置文件
Puppeteer creates its own browser user profile which it cleans up on every run.Puppeteer创建自己的浏览器用户配置文件,每次运行时都会清理该文件。
See our guide on using Docker.请参阅Docker使用指南。
See our guide on using Chrome extensions.请参阅Chrome扩展使用指南。
Check out our contributing guide to get an overview of Puppeteer development.查看贡献指南,了解Puppeteer发展概况。