Automated Recursive Web Scraper for Content Collection

somdn_product_page

This n8n workflow automates comprehensive web scraping across multiple linked pages with recursive depth. Designed to streamline content aggregation, it begins by receiving seed URLs and parameters through a form or from another workflow. The process starts by creating a new Google Sheet to track URLs and a Google Doc for storing scraped content. It then scrapes the initial webpage, extracts the content, and saves it into the document. The workflow proceeds to discover new internal links, filters them based on user-defined criteria, and appends the filtered links to the sheet for later scraping. This recursive process repeats for a specified depth, allowing deep web content extraction across multiple pages. The process includes flags to mark scraped links, maintains logs, and is suitable for use cases like research, lead generation, or digital content collection. The workflow also supports execution triggered externally via form submission or from other workflows, ensuring flexible integration into larger automation setups.

Node Count

>20 Nodes

Nodes Used

airtop, code, executeWorkflowTrigger, formTrigger, googleDocs, googleSheets, if, set, stickyNote

Reviews

There are no reviews yet.

Be the first to review “Automated Recursive Web Scraper for Content Collection”

Your email address will not be published. Required fields are marked *