This n8n workflow automates the process of evaluating whether the answers generated by an AI model match the reference answers in a dataset. It fetches question data from Google Sheets, uses OpenAI’s GPT-4 models to generate and evaluate answers, and calculates relevance metrics. The workflow is useful for QA teams, AI training, or benchmarking AI performance by systematically comparing generated responses to authoritative answers. The process includes data retrieval, similarity assessment, and conditional evaluation to optimize costs and improve accuracy in answer validation.
Automated Answer Evaluation for Dataset Questions
Node Count | 11 – 20 Nodes |
---|---|
Nodes Used | @n8n/n8n-nodes-langchain.agent, @n8n/n8n-nodes-langchain.chatTrigger, @n8n/n8n-nodes-langchain.lmChatOpenAi, @n8n/n8n-nodes-langchain.openAi, evaluation, evaluationTrigger, noOp, set, stickyNote |
Reviews
There are no reviews yet.