Articles

Getting Started with Puppeteer in Node.js

Sat Aug 17 2024 · 3 min read
Photo by Duncan Meyer on Unsplash

Modern web development often requires testing and automating various web applications and processes. Traditionally, these tasks were complex, involving multiple tools and a steep learning curve. Enter Puppeteer—a powerful Node.js library created by Google that simplifies browser automation. If you’re looking to scrape web data, run end-to-end tests, or just automate the web, Puppeteer might just be the tool you need. Let’s dive into getting started with Puppeteer and see how you can leverage it in your Node.js projects.

What is Puppeteer?

Before we get into the nitty-gritty of coding, it’s helpful to understand what Puppeteer is. Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. It’s well-suited for headless browser tasks, but it can also run in full (non-headless) mode. In short, Puppeteer is a heavyweight champ for web scraping, automating routine web tasks, generating screenshots and PDFs of webpages, and more.

Setting Up Puppeteer

To get started with Puppeteer, you’ll need Node.js installed on your machine. If you haven’t already, head to the Node.js official website and download the latest version. Once Node.js is set up, you can initialize a new project and install Puppeteer.

Open your terminal and run the following commands:

mkdir puppeteer-demo
cd puppeteer-demo
npm init -y
npm install puppeteer

This will create a new directory for your project, initialize it, and install Puppeteer.

Basic Usage

Let’s write a simple script to see Puppeteer in action. The following example will launch a browser, open a new page, navigate to a website (let’s say GitHub), and take a screenshot.

Create a file named index.js and add the following code:

const puppeteer = require("puppeteer")
;(async () => {
    // Launch a browser
    const browser = await puppeteer.launch()
    const page = await browser.newPage()

    // Navigate to GitHub
    await page.goto("https://github.com")

    // Take a screenshot
    await page.screenshot({ path: "github.png" })

    // Close the browser
    await browser.close()
})()

To run this script, go back to your terminal and execute:

node index.js

If all goes well, you should see a github.png file in your directory. This file is a screenshot of the GitHub homepage.

Photo by Michał Robak on Unsplash

Advanced Features

Now that you have a basic understanding of Puppeteer, let’s explore some advanced features.

Handling Forms

Suppose you want to automate form submission. Puppeteer can help you with that. Let’s continue with an example of logging into a website. For demonstration purposes, we’ll use a fictitious login form.

Update your index.js file to include the following code:

const puppeteer = require("puppeteer")
;(async () => {
    const browser = await puppeteer.launch({ headless: false }) // Keep the browser open
    const page = await browser.newPage()

    await page.goto("https://example.com/login") // Replace with actual login URL

    // Fill in the login form
    await page.type("#username", "your-username") // Replace with actual form selectors and username
    await page.type("#password", "your-password") // Replace with actual form selectors and password

    // Submit the form
    await Promise.all([
        page.click("#login-button"), // Replace with actual form button selector
        page.waitForNavigation(), // Wait for navigation to finish
    ])

    console.log("Login successful!")

    await browser.close()
})()

This script opens a browser, navigates to a login page, fills in the form fields, and submits it. If the login is successful, the message “Login successful!” will be printed to your console.

Scraping Data

Puppeteer is also fantastic for web scraping. Here’s an example of extracting data from a webpage. Let’s scrape the titles of trending repositories on GitHub.

Update your index.js file:

const puppeteer = require("puppeteer")
;(async () => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()

    await page.goto("https://github.com/trending")

    // Scrape repository titles
    const repoTitles = await page.evaluate(() => {
        const repos = Array.from(document.querySelectorAll("h1.h3.lh-condensed"))
        return repos.map((repo) => repo.innerText.trim())
    })

    console.log("Trending Repositories:", repoTitles)

    await browser.close()
})()

This code navigates to the GitHub trending page and extracts the titles of trending repositories. The results are printed to your console.

Best Practices

While Puppeteer is an amazing tool, it’s essential to follow some best practices to make the most out of it:

  1. Error Handling: Always use try-catch blocks or .catch to handle errors gracefully.
  2. Resource Management: Ensure you close the browser (await browser.close()) to free up resources.
  3. Timeouts: Use appropriate timeouts to handle delays in network requests.
  4. Headless Mode: Use headless mode for most tasks for better performance and reliability.

Conclusion

With its robust and straightforward API, Puppeteer makes it easier to automate and test your web applications. Whether you’re aiming to scrape data, generate screenshots, or automate form submissions, Puppeteer provides the tools you need to get the job done efficiently. So arm yourself with this versatile library and take your Node.js projects to new heights.

Happy coding!

Report bugs like it's 2024
Bug reports has looked the same since forever. You try to jam as much detail as possible to avoid the dreaded "can't reproduce". It's time to fix that. Whitespace captures every possible detail automatically and puts it all in a neat little package you can share as link.

Read more

Getting Started with Puppeteer in PHP

When it comes to web scraping, automated testing, or rendering webpages, many developers turn to powerful tools like Puppeteer. Read more

Published 3 min read
Getting Started with Puppeteer in Python

In today's fast-paced digital landscape, web automation is an essential skill for developers and testers alike. Read more

Published 3 min read
Getting Started with Puppeteer in C#

Web scraping—or extraction—is a critical tool in modern web development, used in gathering data from different web sources. Read more

Published 5 min read
Getting Started with Puppeteer in Java

Automation testing has become an integral part of the development ecosystem. Read more

Published 3 min read
Getting Started with Puppeteer in JavaScript

As a developer, you’ve probably had moments where you needed to automate repetitive browser tasks, like scraping web data, generating screenshots, or testing web applications. Read more

Published 4 min read
Getting Started with Playwright in PHP

In the fast-paced world of web development, testing is essential to ensure the stability and functionality of applications. Read more

Published 3 min read
Getting Started with Playwright in Python

In the realm of web application development, ensuring that your application works flawlessly across different browsers is no small feat. Read more

Published 3 min read
Getting Started with Playwright in C#

In the fast-evolving world of web development, you need reliable tools for your end-to-end testing to ensure your applications run smoothly across different browsers and environments. Read more

Published 3 min read
Getting Started with Playwright in Java

Modern web development can sometimes feel like a whirlwind of continuous updates and new tools. Read more

Published 5 min read
Getting Started with Playwright in JavaScript

Development and testing can often feel like taming a herd of wild animals. Read more

Published 4 min read
One-click bug reports straight from your browser
Built in EU 🇪🇺