This n8n workflow automates the process of discovering, crawling, and extracting structured data from a website’s sitemap, then stores this information in Google Sheets. It enables both initial site discovery and ongoing querying, supporting intelligent chat-based interactions. The workflow detecst if a site has been indexed already; if not, it performs URL validation, fetches robots.txt, extracts the sitemap URL, and crawls all pages listed in the sitemap. It extracts key webpage data such as language, H1 hierarchy, internal/external links, and page summaries, storing this data in a Google Sheet. The setup supports a dual-mode operation: initial site crawling and ongoing AI-powered Q&A based on stored data. It’s ideal for website owners seeking to automate SEO audits, generate site summaries, or enable AI-driven site analysis and customer support.
Automated Website Crawling, Data Extraction, and Q&A Workflow
Node Count | >20 Nodes |
---|---|
Nodes Used | @n8n/n8n-nodes-langchain.agent, @n8n/n8n-nodes-langchain.chat, @n8n/n8n-nodes-langchain.chatTrigger, @n8n/n8n-nodes-langchain.lmChatOpenAi, @n8n/n8n-nodes-langchain.memoryBufferWindow, @n8n/n8n-nodes-langchain.openAi, @n8n/n8n-nodes-langchain.outputParserStructured, code, googleSheets, googleSheetsTool, httpRequest, httpRequestTool, if, markdown, set, splitInBatches, splitOut, stickyNote, stopAndError, xml |
Reviews
There are no reviews yet.