Automated News Scraper & Summarizer with n8n

somdn_product_page

This n8n workflow automates the process of scraping news articles from a website without an RSS feed, extracting key content, summarizing the news, identifying important keywords, and storing the insights in a database. Triggered weekly via a cron job, it pulls the latest news posts, filters for recent articles, and uses OpenAI’s GPT-4 for content analysis and summarization. The data is then organized and saved in a NocoDB database for easy access and further processing.

The workflow involves several nodes:

– A schedule trigger activates the process weekly.

– HTML extraction nodes scrape the webpage using CSS selectors for post links, dates, titles, and content.

– Filter nodes select only recent posts based on date.

– HTTP request nodes fetch full content from individual post links.

– OpenAI nodes generate summaries and extract key technical keywords.

– Merge nodes organize data into structured JSON objects.

– Finally, the data is stored in a NocoDB database.

This setup is ideal for monitoring news updates from sites lacking RSS feeds, keeping content up-to-date automatically with minimal manual effort, and preparing summaries and insights for further analysis or publication.

Node Count

>20 Nodes

Nodes Used

code, html, httpRequest, itemLists, merge, nocoDb, openAi, scheduleTrigger, set, stickyNote

Reviews

There are no reviews yet.

Be the first to review “Automated News Scraper & Summarizer with n8n”

Your email address will not be published. Required fields are marked *