This n8n workflow automates the process of scraping news articles from a website without an RSS feed, extracting key content, summarizing the news, identifying important keywords, and storing the insights in a database. Triggered weekly via a cron job, it pulls the latest news posts, filters for recent articles, and uses OpenAI’s GPT-4 for content analysis and summarization. The data is then organized and saved in a NocoDB database for easy access and further processing.
The workflow involves several nodes:
– A schedule trigger activates the process weekly.
– HTML extraction nodes scrape the webpage using CSS selectors for post links, dates, titles, and content.
– Filter nodes select only recent posts based on date.
– HTTP request nodes fetch full content from individual post links.
– OpenAI nodes generate summaries and extract key technical keywords.
– Merge nodes organize data into structured JSON objects.
– Finally, the data is stored in a NocoDB database.
This setup is ideal for monitoring news updates from sites lacking RSS feeds, keeping content up-to-date automatically with minimal manual effort, and preparing summaries and insights for further analysis or publication.
Reviews
There are no reviews yet.