Webpage Content Extraction with Firecrawl API

somdn_product_page

This n8n workflow automates the process of crawling web pages, converting their HTML content into markdown, and extracting all links for further analysis. It starts with a manual trigger, followed by a wait step to respect API rate limits, and then fetches URLs from a specified data source or array. The workflow processes URLs in batches of 40, divided into smaller chunks of 10 requests to avoid exceeding API limits. Each URL is sent to the Firecrawl API, which returns the webpage’s markdown content and links. The extracted data is then structured with relevant metadata and stored in your own database, such as Airtable. This workflow is ideal for content analysis, SEO auditing, or preparing web content for AI processing, especially when dealing with large volumes of URLs while managing API rate limits.

Node Count

11 – 20 Nodes

Nodes Used

httpRequest, limit, manualTrigger, noOp, set, splitInBatches, splitOut, stickyNote, wait

Reviews

There are no reviews yet.

Be the first to review “Webpage Content Extraction with Firecrawl API”

Your email address will not be published. Required fields are marked *