This n8n workflow enables a sophisticated AI voice chat system that processes voice messages, maintains conversation context, and generates audio responses. It integrates various AI models and services, including OpenAI, Google Gemini, and ElevenLabs, to facilitate natural language understanding and speech synthesis.

The workflow begins with a webhook trigger, allowing users to send voice messages via an external application. The voice message is then transcribed into text using OpenAI’s Speech-to-Text service. The transcribed text is passed to a memory management component to store and retrieve conversational context.

Next, the context-aware message is fed into a language model (Google Gemini) to generate a relevant AI response. This response is stored and inserted into the conversation history for future context.

The workflow also supports generating speech audio from the AI response using ElevenLabs’ text-to-speech API. The final audio is sent back as a response through the webhook, enabling real-time voice interaction.

Additional nodes include sticky notes for visual explanation, and limiters for controlling data flow, ensuring smooth operation. This setup is ideal for building voice-based chatbots, virtual assistants, or interactive voice response (IVR) systems that require context awareness and natural language processing.

Node Count	11 – 20 Nodes
Nodes Used	@n8n/n8n-nodes-langchain.chainLlm, @n8n/n8n-nodes-langchain.lmChatGoogleGemini, @n8n/n8n-nodes-langchain.memoryBufferWindow, @n8n/n8n-nodes-langchain.memoryManager, @n8n/n8n-nodes-langchain.openAi, aggregate, httpRequest, limit, respondToWebhook, stickyNote, webhook