langchain.smith.evaluation.runner_utils.run_on_dataset¶
- langchain.smith.evaluation.runner_utils.run_on_dataset(client: Client, dataset_name: str, llm_or_chain_factory: Union[Callable[[], Chain], BaseLanguageModel], *, evaluation: Optional[RunEvalConfig] = None, num_repetitions: int = 1, project_name: Optional[str] = None, verbose: bool = False, tags: Optional[List[str]] = None, input_mapper: Optional[Callable[[Dict], Any]] = None) Dict[str, Any][source]¶
Run the Chain or language model on a dataset and store traces to the specified project name.
- Parameters
client – LangSmith client to use to access the dataset and to log feedback and run traces.
dataset_name – Name of the dataset to run the chain on.
llm_or_chain_factory – Language model or Chain constructor to run over the dataset. The Chain constructor is used to permit independent calls on each example without carrying over state.
evaluation – Configuration for evaluators to run on the results of the chain
num_repetitions – Number of times to run the model on each example. This is useful when testing success rates or generating confidence intervals.
project_name – Name of the project to store the traces in. Defaults to {dataset_name}-{chain class name}-{datetime}.
verbose – Whether to print progress.
tags – Tags to add to each run in the project.
input_mapper – A function to map to the inputs dictionary from an Example to the format expected by the model to be evaluated. This is useful if your model needs to deserialize more complex schema or if your dataset has inputs with keys that differ from what is expected by your chain or agent.
- Returns
A dictionary containing the run’s project name and the resulting model outputs.
For the (usually faster) async version of this function, see
arun_on_dataset().Examples
from langsmith import Client from langchain.chat_models import ChatOpenAI from langchain.chains import LLMChain from langchain.smith import RunEvalConfig, run_on_dataset # Chains may have memory. Passing in a constructor function lets the # evaluation framework avoid cross-contamination between runs. def construct_chain(): llm = ChatOpenAI(temperature=0) chain = LLMChain.from_string( llm, "What's the answer to {your_input_key}" ) return chain # Load off-the-shelf evaluators via config or the EvaluatorType (string or enum) evaluation_config = RunEvalConfig( evaluators=[ "qa", # "Correctness" against a reference answer "embedding_distance", RunEvalConfig.Criteria("helpfulness"), RunEvalConfig.Criteria({ "fifth-grader-score": "Do you have to be smarter than a fifth grader to answer this question?" }), ] ) client = Client() run_on_dataset( client, "<my_dataset_name>", construct_chain, evaluation=evaluation_config, )
You can also create custom evaluators by subclassing the
StringEvaluatoror LangSmith’s RunEvaluator classes.from typing import Optional from langchain.evaluation import StringEvaluator class MyStringEvaluator(StringEvaluator): @property def requires_input(self) -> bool: return False @property def requires_reference(self) -> bool: return True @property def evaluation_name(self) -> str: return "exact_match" def _evaluate_strings(self, prediction, reference=None, input=None, **kwargs) -> dict: return {"score": prediction == reference} evaluation_config = RunEvalConfig( custom_evaluators = [MyStringEvaluator()], ) run_on_dataset( client, "<my_dataset_name>", construct_chain, evaluation=evaluation_config, )