Share via


ToolCallAccuracyEvaluator Class

Definition

An IEvaluator that evaluates an AI system's effectiveness at using the tools supplied to it.

public ref class ToolCallAccuracyEvaluator sealed : Microsoft::Extensions::AI::Evaluation::IEvaluator
[System.Diagnostics.CodeAnalysis.Experimental("AIEVAL001")]
public sealed class ToolCallAccuracyEvaluator : Microsoft.Extensions.AI.Evaluation.IEvaluator
public sealed class ToolCallAccuracyEvaluator : Microsoft.Extensions.AI.Evaluation.IEvaluator
[<System.Diagnostics.CodeAnalysis.Experimental("AIEVAL001")>]
type ToolCallAccuracyEvaluator = class
    interface IEvaluator
type ToolCallAccuracyEvaluator = class
    interface IEvaluator
Public NotInheritable Class ToolCallAccuracyEvaluator
Implements IEvaluator
Inheritance
ToolCallAccuracyEvaluator
Attributes
Implements

Remarks

ToolCallAccuracyEvaluator measures how accurately an AI system uses tools by examining tool calls (i.e., FunctionCallContents) present in the supplied response to assess the relevance of these tool calls to the conversation, the parameter correctness for these tool calls with regard to the tool definitions supplied via ToolDefinitions, and the accuracy of the parameter value extraction from the supplied conversation.

Note that at the moment, ToolCallAccuracyEvaluator only supports evaluating calls to tools that are defined as AIFunctions. Any other AITool definitions that are supplied via ToolDefinitions will be ignored.

ToolCallAccuracyEvaluator returns a BooleanMetric that contains a score for 'Tool Call Accuracy'. The score is false if the tool call is irrelevant or contains information not present in the conversation and true if the tool call is relevant with properly extracted parameters from the conversation.

Note: ToolCallAccuracyEvaluator is an AI-based evaluator that uses an AI model to perform its evaluation. While the prompt that this evaluator uses to perform its evaluation is designed to be model-agnostic, the performance of this prompt (and the resulting evaluation) can vary depending on the model used, and can be especially poor when a smaller / local model is used.

The prompt that ToolCallAccuracyEvaluator uses has been tested against (and tuned to work well with) the following models. So, using this evaluator with a model from the following list is likely to produce the best results. (The model to be used can be configured via ChatClient.)

GPT-4o

Constructors

ToolCallAccuracyEvaluator()

Properties

EvaluationMetricNames

Gets the Names of the EvaluationMetrics produced by this IEvaluator.

ToolCallAccuracyMetricName

Gets the Name of the BooleanMetric returned by ToolCallAccuracyEvaluator.

Methods

EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Evaluates the supplied modelResponse and returns an EvaluationResult containing one or more EvaluationMetrics.

Extension Methods

EvaluateAsync(IEvaluator, ChatMessage, ChatMessage, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Evaluates the supplied modelResponse and returns an EvaluationResult containing one or more EvaluationMetrics.

EvaluateAsync(IEvaluator, ChatMessage, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Evaluates the supplied modelResponse and returns an EvaluationResult containing one or more EvaluationMetrics.

EvaluateAsync(IEvaluator, ChatMessage, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Evaluates the supplied modelResponse and returns an EvaluationResult containing one or more EvaluationMetrics.

EvaluateAsync(IEvaluator, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Evaluates the supplied modelResponse and returns an EvaluationResult containing one or more EvaluationMetrics.

EvaluateAsync(IEvaluator, String, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Evaluates the supplied modelResponse and returns an EvaluationResult containing one or more EvaluationMetrics.

EvaluateAsync(IEvaluator, String, String, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Evaluates the supplied modelResponse and returns an EvaluationResult containing one or more EvaluationMetrics.

Applies to