This workflow automates the process of extracting, splitting, embedding, and storing documents from Google Drive into a vector database, streamlining data management and retrieval. It is triggered manually or on a schedule, making it ideal for maintaining an up-to-date vector database for applications like search, AI, or data analysis.
The process begins with a manual trigger or scheduled run, which prompts the workflow to search a specific folder on Google Drive for files. Each file is then downloaded, and based on its MIME type, it is split into manageable chunks—whether PDFs, text, or JSON—using dedicated extraction nodes. These chunks are processed through a recursive text splitter for optimal size.
Next, the text or JSON data is passed through an OpenAI embedding model, creating vector representations of the data. These embeddings, along with the original documents, are stored in a PostgreSQL-based vector store, enabling fast similarity searches later. Simultaneously, each processed file is moved to a designated ‘vectorized’ folder on Google Drive for organization.
This workflow is practical for businesses or developers who need to automate the ingestion of various document types into a vector database for search solutions, knowledge management, or AI-powered applications.
Reviews
There are no reviews yet.