This n8n workflow automates the extraction, analysis, and retrieval of information from PDF documents using advanced AI services. The process begins with uploading PDFs to Mistral’s OCR API to extract text content. The text is then stored in a Qdrant vector database for efficient similarity searches. The workflow includes steps for document vectorization using OpenAI embeddings, creating and managing Qdrant collections, and applying summarization techniques via Google Gemini models.

A key feature is the retrieval-augmented generation (RAG), where questions posed to the AI are answered using relevant document snippets retrieved from the Qdrant database. It supports dynamic processing, with manual triggers and sub-workflows for scalable operations, making it suitable for large-scale document analysis, knowledge base building, or FAQ automation. The system allows for real-time testing and ensures that document content can be summarized and queried effectively, providing a powerful integration of PDF OCR, vector similarity search, and language models.

Node Count

>20 Nodes

Nodes Used

@n8n/n8n-nodes-langchain.chainRetrievalQa, @n8n/n8n-nodes-langchain.chainSummarization, @n8n/n8n-nodes-langchain.chatTrigger, @n8n/n8n-nodes-langchain.documentDefaultDataLoader, @n8n/n8n-nodes-langchain.embeddingsOpenAi, @n8n/n8n-nodes-langchain.lmChatGoogleGemini, @n8n/n8n-nodes-langchain.retrieverVectorStore, @n8n/n8n-nodes-langchain.textSplitterTokenSplitter, @n8n/n8n-nodes-langchain.vectorStoreQdrant, code, executeWorkflow, executeWorkflowTrigger, googleDrive, httpRequest, manualTrigger, set, splitInBatches, stickyNote, wait