This n8n workflow automates the process of fetching an image, generating a descriptive caption using AI, and overlaying that caption onto the image. It utilizes Google Gemini’s multimodal capabilities to analyze the image and produce contextually relevant captions. The workflow follows several steps: starting with manually triggering the process, downloading an image from a URL, resizing it for AI processing, using an advanced language model to generate a caption, calculating the optimal position for the caption overlay, and finally editing the image to add the caption text at the bottom. This automation is useful for creating visually captioned content for social media, posts, or watermarking images with descriptive labels, all within an automated workflow.

The process begins when you click to start the workflow. It then downloads a sample image from Pexels, resizes it for AI analysis, and taps into Google Gemini to generate a creative caption based on the image content. The workflow includes a custom code node to calculate appropriate placement for the caption, ensuring it is visually appealing. Subsequently, the caption is overlaid onto the bottom of the image using n8n’s image editing features. The result is a ready-to-use image with a tailored caption overlay, ideal for social media posts, blogs, or copyright watermarks.

This workflow illustrates the power of combining vision models, language generation, and image editing within an automated environment, enabling content creators and marketers to streamline image captioning and overlay tasks efficiently.

Node Count	11 – 20 Nodes
Nodes Used	@n8n/n8n-nodes-langchain.chainLlm, @n8n/n8n-nodes-langchain.lmChatGoogleGemini, @n8n/n8n-nodes-langchain.outputParserStructured, code, editImage, httpRequest, manualTrigger, merge, stickyNote