This n8n workflow automates the process of evaluating the relevance and quality of AI-generated chat responses, especially for question-and-answer (Q&A) systems. It integrates Google Sheets for data management, OpenAI’s language models for generating and analyzing text, and custom scoring logic to assess response relevance.
The workflow begins with a dataset trigger linked to a Google Sheet, which fetches new data or responses for evaluation. It then remaps input data into a standardized format before passing the input into an AI agent powered by OpenAI, which generates a response. An evaluation node checks if the message should be evaluated or skipped.
The core of the process involves measuring how closely the AI response matches the original question. This is achieved by generating embeddings via OpenAI’s API, calculating cosine similarity to quantify semantic similarity, and determining the relevance score. Additional nodes generate questions from the AI’s answer to verify the consistency and relevance of the response.
Results are systematically recorded back into the Google Sheet, including the relevance score and other metrics. Sticky notes within the workflow offer guidance on setup, scoring methodology, and practical application scenarios.
Overall, this workflow is ideal for maintaining high standards in conversational AI systems by providing automated, objective assessments of answer relevancy, thereby enabling continuous improvement of AI responses and user satisfaction.
Reviews
There are no reviews yet.