Automated Web Content Scraping to Google Drive

somdn_product_page

This n8n workflow automates the process of scraping web content from a list of URLs, converting the content into markdown format, and organizing it into Google Drive documents. It is designed to streamline content collection for knowledge bases, competitive analysis, or website migration projects. The workflow begins either through a chat trigger or manual input, retrieves URLs from a Google Sheets template, and then sequentially scrapes each webpage using the Firecrawl API. Valid URLs are processed to extract content in markdown, which is then saved as individual Google Docs in a specified Drive folder. The workflow updates the Google Sheets with status markers to track progress and prevents duplicate scraping. Upon completion, it sends a web response with a link to the scraped content folder, providing a clear end-of-process notification.

Node Count

6 – 10 Nodes

Nodes Used

@mendable/n8n-nodes-firecrawl.firecrawl, @n8n/n8n-nodes-langchain.chatTrigger, filter, googleDrive, googleSheets, if, respondToWebhook, splitInBatches, stickyNote

Reviews

There are no reviews yet.

Be the first to review “Automated Web Content Scraping to Google Drive”

Your email address will not be published. Required fields are marked *