This n8n workflow enables the automation of converting PDFs into clean, editable Google Docs using OCR technology from Mistral. It begins with a user submitting a PDF through a form, which triggers the workflow. The PDF is uploaded to Mistral for OCR processing, extracting text from both the document and embedded images. The workflow extracts image placeholders and performs OCR on inline images to improve accuracy. It merges the extracted text from pages and images, then cleans and normalizes the text to remove noise, boilerplate, and formatting artifacts. The cleaned content is then inserted into a new Google Document in a specified folder. This workflow is ideal for digitizing printed documents, reports, or scanned materials into editable formats effortlessly, especially useful for document management, archival, and content repurposing.
Automated PDF to Google Doc Workflow with OCR and Cleanup
Node Count | 11 – 20 Nodes |
---|---|
Nodes Used | @n8n/n8n-nodes-langchain.chainLlm, @n8n/n8n-nodes-langchain.lmChatOpenRouter, aggregate, code, formTrigger, googleDocs, httpRequest, merge, set, stickyNote |
Reviews
There are no reviews yet.