`langchain.smith.evaluation.config`.RunEvalConfig¶

class langchain.smith.evaluation.config.RunEvalConfig(*, evaluators: List[Union[EvaluatorType, EvalConfig]] = None, custom_evaluators: Optional[List[Union[RunEvaluator, StringEvaluator]]] = None, reference_key: Optional[str] = None, prediction_key: Optional[str] = None, input_key: Optional[str] = None, eval_llm: Optional[BaseLanguageModel] = None)[source]¶

Bases: BaseModel

Configuration for a run evaluation.

Parameters

evaluators (List[Union[EvaluatorType, EvalConfig]]) – Configurations for which evaluators to apply to the dataset run. Each can be the string of an EvaluatorType, such as EvaluatorType.QA, the evaluator type string (“qa”), or a configuration for a given evaluator (e.g., RunEvalConfig.QA).
custom_evaluators (Optional[List[Union[RunEvaluator, StringEvaluator]]]) – Custom evaluators to apply to the dataset run.
reference_key (Optional[str]) – The key in the dataset run to use as the reference string. If not provided, it will be inferred automatically.
prediction_key (Optional[str]) – The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
input_key (Optional[str]) – The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
eval_llm (Optional[BaseLanguageModel]) – The language model to pass to any evaluators that use a language model.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param custom_evaluators: Optional[List[Union[langsmith.evaluation.evaluator.RunEvaluator, langchain.evaluation.schema.StringEvaluator]]] = None¶: Custom evaluators to apply to the dataset run.

param eval_llm: Optional[langchain.schema.language_model.BaseLanguageModel] = None¶: The language model to pass to any evaluators that require one.

param evaluators: List[Union[langchain.evaluation.schema.EvaluatorType, langchain.smith.evaluation.config.EvalConfig]] [Optional]¶: Configurations for which evaluators to apply to the dataset run. Each can be the string of an EvaluatorType, such as EvaluatorType.QA, the evaluator type string (“qa”), or a configuration for a given evaluator (e.g., RunEvalConfig.QA).

param input_key: Optional[str] = None¶: The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.

param prediction_key: Optional[str] = None¶: The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.

param reference_key: Optional[str] = None¶: The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.

class CoTQA[source]¶

Bases: EvalConfig

Configuration for a context-based QA evaluator.

Parameters

prompt (Optional[BasePromptTemplate]) – The prompt template to use for generating the question.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields

evaluator_type (langchain.evaluation.schema.EvaluatorType)
llm (Optional[langchain.schema.language_model.BaseLanguageModel])
prompt (Optional[langchain.schema.prompt_template.BasePromptTemplate])

param evaluator_type: langchain.evaluation.schema.EvaluatorType = EvaluatorType.CONTEXT_QA¶

param llm: Optional[langchain.schema.language_model.BaseLanguageModel] = None¶

param prompt: Optional[langchain.schema.prompt_template.BasePromptTemplate] = None¶

get_kwargs() → Dict[str, Any]¶

Get the keyword arguments for the load_evaluator call.

Returns: The keyword arguments for the load_evaluator call.
Return type: Dict[str, Any]

class ContextQA[source]¶

Bases: EvalConfig

Configuration for a context-based QA evaluator.

Parameters

prompt (Optional[BasePromptTemplate]) – The prompt template to use for generating the question.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields

evaluator_type (langchain.evaluation.schema.EvaluatorType)
llm (Optional[langchain.schema.language_model.BaseLanguageModel])
prompt (Optional[langchain.schema.prompt_template.BasePromptTemplate])

param evaluator_type: langchain.evaluation.schema.EvaluatorType = EvaluatorType.CONTEXT_QA¶

param llm: Optional[langchain.schema.language_model.BaseLanguageModel] = None¶

param prompt: Optional[langchain.schema.prompt_template.BasePromptTemplate] = None¶

get_kwargs() → Dict[str, Any]¶

Get the keyword arguments for the load_evaluator call.

Returns: The keyword arguments for the load_evaluator call.
Return type: Dict[str, Any]

class Criteria[source]¶

Bases: EvalConfig

Configuration for a reference-free criteria evaluator.

Parameters

criteria (Optional[CRITERIA_TYPE]) – The criteria to evaluate.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields

criteria (Optional[Union[Mapping[str, str], langchain.evaluation.criteria.eval_chain.Criteria, langchain.chains.constitutional_ai.models.ConstitutionalPrinciple]])
evaluator_type (langchain.evaluation.schema.EvaluatorType)
llm (Optional[langchain.schema.language_model.BaseLanguageModel])

param criteria: Optional[Union[Mapping[str, str], langchain.evaluation.criteria.eval_chain.Criteria, langchain.chains.constitutional_ai.models.ConstitutionalPrinciple]] = None¶

param evaluator_type: langchain.evaluation.schema.EvaluatorType = EvaluatorType.CRITERIA¶

param llm: Optional[langchain.schema.language_model.BaseLanguageModel] = None¶

get_kwargs() → Dict[str, Any]¶

Get the keyword arguments for the load_evaluator call.

Returns: The keyword arguments for the load_evaluator call.
Return type: Dict[str, Any]

class EmbeddingDistance[source]¶

Bases: EvalConfig

Configuration for an embedding distance evaluator.

Parameters

embeddings (Optional[Embeddings]) – The embeddings to use for computing the distance.
distance_metric (Optional[EmbeddingDistanceEnum]) – The distance metric to use for computing the distance.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields

distance_metric (Optional[langchain.evaluation.embedding_distance.base.EmbeddingDistance])
embeddings (Optional[langchain.embeddings.base.Embeddings])
evaluator_type (langchain.evaluation.schema.EvaluatorType)

param distance_metric: Optional[langchain.evaluation.embedding_distance.base.EmbeddingDistance] = None¶

param embeddings: Optional[langchain.embeddings.base.Embeddings] = None¶

param evaluator_type: langchain.evaluation.schema.EvaluatorType = EvaluatorType.EMBEDDING_DISTANCE¶

get_kwargs() → Dict[str, Any]¶

Get the keyword arguments for the load_evaluator call.

Returns: The keyword arguments for the load_evaluator call.
Return type: Dict[str, Any]

class LabeledCriteria[source]¶

Bases: EvalConfig

Configuration for a labeled (with references) criteria evaluator.

Parameters

criteria (Optional[CRITERIA_TYPE]) – The criteria to evaluate.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields

criteria (Optional[Union[Mapping[str, str], langchain.evaluation.criteria.eval_chain.Criteria, langchain.chains.constitutional_ai.models.ConstitutionalPrinciple]])
evaluator_type (langchain.evaluation.schema.EvaluatorType)
llm (Optional[langchain.schema.language_model.BaseLanguageModel])

param criteria: Optional[Union[Mapping[str, str], langchain.evaluation.criteria.eval_chain.Criteria, langchain.chains.constitutional_ai.models.ConstitutionalPrinciple]] = None¶

param evaluator_type: langchain.evaluation.schema.EvaluatorType = EvaluatorType.LABELED_CRITERIA¶

param llm: Optional[langchain.schema.language_model.BaseLanguageModel] = None¶

get_kwargs() → Dict[str, Any]¶

Get the keyword arguments for the load_evaluator call.

Returns: The keyword arguments for the load_evaluator call.
Return type: Dict[str, Any]

class QA[source]¶

Bases: EvalConfig

Configuration for a QA evaluator.

Parameters

prompt (Optional[BasePromptTemplate]) – The prompt template to use for generating the question.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields

evaluator_type (langchain.evaluation.schema.EvaluatorType)
llm (Optional[langchain.schema.language_model.BaseLanguageModel])
prompt (Optional[langchain.schema.prompt_template.BasePromptTemplate])

param evaluator_type: langchain.evaluation.schema.EvaluatorType = EvaluatorType.QA¶

param llm: Optional[langchain.schema.language_model.BaseLanguageModel] = None¶

param prompt: Optional[langchain.schema.prompt_template.BasePromptTemplate] = None¶

get_kwargs() → Dict[str, Any]¶

Get the keyword arguments for the load_evaluator call.

Returns: The keyword arguments for the load_evaluator call.
Return type: Dict[str, Any]

class StringDistance[source]¶

Bases: EvalConfig

Configuration for a string distance evaluator.

Parameters: distance (Optional[StringDistanceEnum]) – The string distance metric to use.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

Fields

distance (Optional[langchain.evaluation.string_distance.base.StringDistance])
evaluator_type (langchain.evaluation.schema.EvaluatorType)
normalize_score (bool)

param distance: Optional[langchain.evaluation.string_distance.base.StringDistance] = None¶: The string distance metric to use. damerau_levenshtein: The Damerau-Levenshtein distance. levenshtein: The Levenshtein distance. jaro: The Jaro distance. jaro_winkler: The Jaro-Winkler distance.

param evaluator_type: langchain.evaluation.schema.EvaluatorType = EvaluatorType.STRING_DISTANCE¶

param normalize_score: bool = True¶: Whether to normalize the distance to between 0 and 1. Applies only to the Levenshtein and Damerau-Levenshtein distances.

get_kwargs() → Dict[str, Any]¶

Get the keyword arguments for the load_evaluator call.

Returns: The keyword arguments for the load_evaluator call.
Return type: Dict[str, Any]

model Config[source]¶

Bases: object

arbitrary_types_allowed = True¶

Examples using RunEvalConfig¶

LangSmith Walkthrough

langchain.smith.evaluation.config.RunEvalConfig¶

Examples using RunEvalConfig¶

`langchain.smith.evaluation.config`.RunEvalConfig¶