This n8n workflow automates the process of extracting detailed airline web check-in and policy information using AI-powered parsing, and storing it in a structured format for analysis and retrieval. It begins with a start trigger that listens for new input, such as a Webhook or scheduled trigger, then fetches airline URLs from a Google Sheet. For each URL, the workflow sends an HTTP POST request to scrape the airline webpage, using necessary authentication headers. The raw webpage content is then processed by an AI language model (LLM) node, which extracts structured data including check-in details, baggage policies, support info, and FAQs, converting messy HTML into a clean JSON object based on a predefined schema. The structured data is saved into a Google Sheet for visibility, and embeddings are generated for semantic search purposes, which are stored in a PostgreSQL vector database. The process supports batch processing, with delays between batches to manage load, making it suitable for continuously updating airline data sources. This workflow is ideal for travel agencies, data aggregators, or airline compliance teams seeking automated, accurate, and retrievable airline policy data.

Node Count	11 – 20 Nodes
Nodes Used	@n8n/n8n-nodes-langchain.chainLlm, @n8n/n8n-nodes-langchain.chatTrigger, @n8n/n8n-nodes-langchain.documentDefaultDataLoader, @n8n/n8n-nodes-langchain.embeddingsOllama, @n8n/n8n-nodes-langchain.lmChatOllama, @n8n/n8n-nodes-langchain.textSplitterTokenSplitter, @n8n/n8n-nodes-langchain.vectorStorePGVector, googleSheets, httpRequest, splitInBatches, stickyNote, wait