Automated Bank Statement Data Extraction Workflow

somdn_product_page

This n8n workflow automates the process of extracting structured data from bank statements stored in Google Drive, transforming scanned or PDF documents into usable financial information. The workflow is designed to handle both digitally generated PDFs and scanned images, making it particularly useful for accurate data retrieval from various bank statement formats.

The process begins with a manual trigger or can be integrated with other systems as a trigger, then downloads a bank statement PDF from Google Drive. It converts each PDF page into images using an external web service, then unzips the resulting ZIP file into individual images. These images are resized to optimize them for optical recognition.

Next, the images are transcribed into markdown text by a vision-language model (like Google Gemini). This conversion captures all visible layout details, such as tables and headings, with high fidelity. The markdown text is then processed by a language model to extract specific financial data, such as deposit transactions, from the transcribed content.

Throughout the workflow, sticky notes provide detailed explanations and guidance, making it accessible for users looking to automate document processing and data extraction tasks, especially for scanned PDFs or complex financial reports.

Node Count

11 – 20 Nodes

Nodes Used

@n8n/n8n-nodes-langchain.chainLlm, @n8n/n8n-nodes-langchain.informationExtractor, @n8n/n8n-nodes-langchain.lmChatGoogleGemini, aggregate, code, compression, editImage, googleDrive, httpRequest, manualTrigger, sort, stickyNote

Reviews

There are no reviews yet.

Be the first to review “Automated Bank Statement Data Extraction Workflow”

Your email address will not be published. Required fields are marked *