This workflow enables detailed image analysis by processing images stored on Google Drive through locally hosted Ollama Vision Language Models. It automates the download of an image, runs multiple models (e.g., Granite3.2, Llama3.2, Gemma3) to generate comprehensive descriptions, and saves the results in Google Docs. The process involves setting prompts for exhaustive analysis, including object inventory, contextual insights, spatial relationships, and text extraction, all formatted in markdown for clarity.
The workflow starts with a manual trigger for testing, then proceeds to download a specified image file from Google Drive. It splits a list of vision models for processing in a loop, where each model analyzes the image and generates a detailed textual description based on the provided prompts. The descriptions are structured and stored in Google Docs, enabling easy collaboration and review. Additionally, sticky notes guide users through the process, highlighting key steps like downloading images and creating model lists.
This automation is ideal for developers, data analysts, real estate professionals, or AI enthusiasts who need in-depth image understanding, structured data extraction, and documentation. Use cases include real estate image analysis, product photography reviews, visual research, and AI training datasets.
Reviews
There are no reviews yet.