Automated Web Data Extraction and Social Media Scraping Workflow

This comprehensive n8n workflow is designed to automate the process of extracting web content, URLs, and social media profiles from specified websites, primarily focusing on company websites. The workflow begins with manual triggers enabling users to specify target URLs or databases containing company info. It includes nodes for retrieving all URLs and text content from provided websites, cleansing duplicate or invalid links, and crawling websites to identify social media profile links using an AI-powered web crawler. Additionally, it fetches company data from a database (such as Supabase), processes and merges this data, and stores the extracted social media URLs back into the database. Practical use cases include monitoring online presence, gathering social media contacts for marketing or research, and automating web content analysis for business intelligence. The workflow also incorporates clear instructions and notes for improving crawling accuracy, such as using proxies, and is easily adaptable for various data scraping tasks.

Node Count	>20 Nodes
Nodes Used	@n8n/n8n-nodes-langchain.agent, @n8n/n8n-nodes-langchain.lmChatOpenAi, @n8n/n8n-nodes-langchain.outputParserStructured, @n8n/n8n-nodes-langchain.toolWorkflow, aggregate, filter, html, httpRequest, manualTrigger, markdown, merge, removeDuplicates, set, splitOut, stickyNote, supabase