This comprehensive n8n workflow automates the processing of media shared on a Discord server, enabling AI analysis, transcription, and image generation. It begins by monitoring specific Discord messages and attachments, categorizing them for appropriate processing. Audio files are fetched and transcribed using Groq’s Whisper model, while images are downloaded, converted, and analyzed through Groq’s Vision Model to generate insightful descriptions or prompts. Simultaneously, the workflow integrates language models like Ollama’s for contextual responses and Wikipedia or Google Search for information retrieval. The system utilizes memory buffers to maintain context across interactions, ensuring cohesive and intelligent responses. The result is a dynamic AI-powered assistant capable of interpreting media, generating images from descriptions, and replying within Discord, perfect for engaging communities and automating media interactions.

Node Count	>20 Nodes
Nodes Used	@n8n/n8n-nodes-langchain.agent, @n8n/n8n-nodes-langchain.lmChatOllama, @n8n/n8n-nodes-langchain.memoryBufferWindow, @n8n/n8n-nodes-langchain.toolSerpApi, @n8n/n8n-nodes-langchain.toolWikipedia, code, discord, httpRequest, if, n8nTrigger, stickyNote, wait