Automation C++ Tutorials Categories: Web Development

Deploy a Puppeteer Microservice on Google Cloud Functions

By Alex Rivers October 20, 2024 #Advanced JavaScript, #Express.js, #github actions, #Google Cloud Functions, #Headless Chrome, #jest puppeteer, #Node.js 15, #Web Scraping

Automating Web Interactions with Puppeteer

Puppeteer, a high-level abstraction of headless Chrome, offers an extensive API for automating interactions with web pages. In this article, we’ll explore a basic example of using Puppeteer to search for a keyword on GitHub and fetch the title of the first result.

Setting Up Puppeteer and Node.js

To get started, let’s initialize a Node.js project and install the required packages. Create a new folder and navigate to it in your terminal. Run the command npm init to generate a package.json file. Then, install Puppeteer using npm install puppeteer.

Creating a Service File

Create a new file named service.mjs and add the following code to launch a Chrome instance and navigate to a URL:
“`javascript
import puppeteer from ‘puppeteer’;

async function scrapePage(url) {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url);
//…
}
“`
Inspecting the Page

To interact with the page, we need to manually inspect the page and specify the DOM elements to target. Open GitHub in a browser and inspect the search input field at the top of the page. We can use the .header-search-input class name to target the element.

Targeting Elements with Puppeteer

Using Puppeteer, we can focus on the input field element and simulate typing. We’ll use the waitForSelector method to ensure the element is rendered on the page and ready for interaction.
javascript async function scrapePage(url) { //... await page.waitForSelector('.header-search-input', { visible: true }); await page.focus('.header-search-input'); await page.type('react'); await page.press('Enter'); //... }
Scraping Data

After navigating to the search results page, we can scrape the title of the first result using the page.evaluate method.
javascript async function scrapePage(url) { //... const repoList = await page.waitForSelector('.repo-list'); const title = await page.evaluate((repoList) => { const repo = repoList.querySelector('li'); return repo.querySelector('.f4.text-normal').innerText; }, repoList); return title; }
Creating an Express Server

To serve the scraped data, we’ll create an Express server with a single endpoint. The endpoint will capture the keyword as a route parameter and call the scrapePage function to fetch the data.
“`javascript
import express from ‘express’;
import { scrapePage } from ‘./service.mjs’;

const app = express();

app.get(‘/:keyword’, async (req, res) => {
const keyword = req.params.keyword;
try {
const title = await scrapePage(https://github.com/search?q=${keyword});
res.send(title);
} catch (error) {
res.status(500).send(error.message);
}
});

app.listen(3000, () => {
console.log(‘Server listening on port 3000’);
});
“`
Deploying to Google Cloud Functions

To deploy our service to a serverless cloud function, we’ll create a new file named index.js and modify the code to export the Express app object.
“`javascript
import express from ‘express’;
import { scrapePage } from ‘./service.js’;

const app = express();

app.get(‘/:keyword’, async (req, res) => {
//…
});

export default app;
We'll also update the `package.json` file to include the required dependencies and set the `type` to `module`.json
{
“name”: “puppeteer-example”,
“version”: “1.0.0”,
“type”: “module”,
“dependencies”: {
“express”: “^4.17.1”,
“puppeteer”: “^13.0.1”
}
}
“Finally, we'll deploy our cloud function to Google Cloud Functions and set the entry point to theindex.js` file. We can then test our cloud function by invoking the trigger URL, which returns the title of the first repository in the list.

Breaking

Deploy a Puppeteer Microservice on Google Cloud Functions

Like this:

Related

By Alex Rivers

Leave a ReplyCancel reply

You Missed

MongoDB Beyond a Single Node: The Ultimate Guide to Replica Sets, Sharding, and High Availability

Your Guide to Digital Sovereignty: How to Run Your Own Ethereum Node in 2025

Your Database Died. Now What? The Magic of Multi-Node MySQL

Hosting Web Apps with VPS

Deploy a Puppeteer Microservice on Google Cloud Functions

Share this:

Like this:

Related

Related posts:

By Alex Rivers

Related Post

Master Component-Driven Development with React’s Ultimate Documentation Tool

Code Alive: Unlock Interactive Learning on Your Website

Reveal User Behavior Secrets: Unlock a Seamless UX with Heatmap Insights

Leave a ReplyCancel reply

You Missed

MongoDB Beyond a Single Node: The Ultimate Guide to Replica Sets, Sharding, and High Availability

Your Guide to Digital Sovereignty: How to Run Your Own Ethereum Node in 2025

Your Database Died. Now What? The Magic of Multi-Node MySQL

Hosting Web Apps with VPS