Automated News Scraping, Summarization, and Storage Workflow

somdn_product_page

This n8n workflow automates the process of extracting the latest news articles from a website without an RSS feed, summarizing each article, identifying key technical keywords, and storing the enriched data in a database. It is designed to run weekly via a cron trigger, making it ideal for continuously monitoring news updates on specific sites.

The workflow begins with a schedule trigger to initiate the process weekly. It retrieves the HTML content of the news page, then extracts the links, dates, titles, and content of individual posts using CSS selectors. It filters the posts to select only those from the last 70 days, focusing on the most recent content.

For each post, the workflow fetches the full article page, extracts the main content and title, and sends the content to ChatGPT to generate a concise summary and identify the top three technical keywords. The data—comprising post title, date, link, summary, and keywords—is then merged and cleaned up.

Finally, the enriched dataset is sent to a NocoDB database, providing an organized and accessible way to review the latest news insights. This workflow is particularly useful for media monitoring, competitive intelligence, or content curation where up-to-date information extraction is critical.

Node Count

>20 Nodes

Nodes Used

code, html, httpRequest, itemLists, merge, nocoDb, openAi, scheduleTrigger, set, stickyNote

Reviews

There are no reviews yet.

Be the first to review “Automated News Scraping, Summarization, and Storage Workflow”

Your email address will not be published. Required fields are marked *