Automated Text Extraction from PDFs and Images to CSV

somdn_product_page

This n8n workflow automates the process of extracting text data from PDFs and images and converting it into structured CSV files. It begins by monitoring a specified Google Drive folder for new files (PDFs or images). Once a new file is detected, the workflow routes the file based on its MIME type. PDFs are downloaded and their textual content is extracted using ‘extractFromFile’, while images are sent to Vertex AI for text recognition. The extracted text is then sent to an AI model (via an HTTP request) to parse transaction data, organize it into CSV format, and add categories. Finally, the CSV data files are uploaded back into Google Drive for storage and further analysis. This workflow is ideal for automating financial transaction processing, invoice extraction, or any scenario requiring structured data from scanned documents or PDFs, reducing manual data entry and improving efficiency.

Node Count

11 – 20 Nodes

Nodes Used

@n8n/n8n-nodes-langchain.chainLlm, @n8n/n8n-nodes-langchain.lmChatGoogleGemini, convertToFile, extractFromFile, googleDrive, googleDriveTrigger, httpRequest, stickyNote, switch

Reviews

There are no reviews yet.

Be the first to review “Automated Text Extraction from PDFs and Images to CSV”

Your email address will not be published. Required fields are marked *