Context-Aware Document Chunking with Google Drive to Pinecone

somdn_product_page

This n8n workflow is designed for efficient processing and semantic search of large documents stored in Google Drive. It automates the extraction, segmentation, contextual analysis, and vectorization of document chunks, enabling rapid retrieval and analysis in applications like AI-powered searches or knowledge management systems. The workflow begins when triggered manually, and it retrieves a Google Doc by its file ID. The document text is extracted, cleaned, and split into sections based on custom delimiters. For each section, it prepares contextual summaries using the OpenRouter Chat Model, which helps situate each chunk within the larger document. These enriched sections are then converted into embedding vectors with Google Gemini’s API and stored in a Pinecone vector database. This process allows highly relevant, context-aware search results across extensive text data. Practical use cases include advanced document search systems, knowledge base indexing, or AI assistants that require deep contextual understanding of large texts.

Node Count

11 – 20 Nodes

Nodes Used

@n8n/n8n-nodes-langchain.agent, @n8n/n8n-nodes-langchain.documentDefaultDataLoader, @n8n/n8n-nodes-langchain.embeddingsGoogleGemini, @n8n/n8n-nodes-langchain.lmChatOpenRouter, @n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter, @n8n/n8n-nodes-langchain.vectorStorePinecone, code, extractFromFile, googleDrive, manualTrigger, set, splitInBatches, splitOut, stickyNote

Reviews

There are no reviews yet.

Be the first to review “Context-Aware Document Chunking with Google Drive to Pinecone”

Your email address will not be published. Required fields are marked *