`langchain.smith.evaluation.runner_utils`.run_on_dataset¶

langchain.smith.evaluation.runner_utils.run_on_dataset(client: Client, dataset_name: str, llm_or_chain_factory: Union[Callable[[], Chain], BaseLanguageModel], *, evaluation: Optional[RunEvalConfig] = None, num_repetitions: int = 1, project_name: Optional[str] = None, verbose: bool = False, tags: Optional[List[str]] = None, input_mapper: Optional[Callable[[Dict], Any]] = None) → Dict[str, Any][source]¶

Run the Chain or language model on a dataset and store traces to the specified project name.

Parameters

client – LangSmith client to use to access the dataset and to log feedback and run traces.
dataset_name – Name of the dataset to run the chain on.
llm_or_chain_factory – Language model or Chain constructor to run over the dataset. The Chain constructor is used to permit independent calls on each example without carrying over state.
evaluation – Configuration for evaluators to run on the results of the chain
num_repetitions – Number of times to run the model on each example. This is useful when testing success rates or generating confidence intervals.
project_name – Name of the project to store the traces in. Defaults to {dataset_name}-{chain class name}-{datetime}.
verbose – Whether to print progress.
tags – Tags to add to each run in the project.
input_mapper – A function to map to the inputs dictionary from an Example to the format expected by the model to be evaluated. This is useful if your model needs to deserialize more complex schema or if your dataset has inputs with keys that differ from what is expected by your chain or agent.

Returns

A dictionary containing the run’s project name and the resulting model outputs.

For the (usually faster) async version of this function, see arun_on_dataset().

Examples

from langsmith import Client
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.smith import RunEvalConfig, run_on_dataset

# Chains may have memory. Passing in a constructor function lets the
# evaluation framework avoid cross-contamination between runs.
def construct_chain():
    llm = ChatOpenAI(temperature=0)
    chain = LLMChain.from_string(
        llm,
        "What's the answer to {your_input_key}"
    )
    return chain

# Load off-the-shelf evaluators via config or the EvaluatorType (string or enum)
evaluation_config = RunEvalConfig(
    evaluators=[
        "qa",  # "Correctness" against a reference answer
        "embedding_distance",
        RunEvalConfig.Criteria("helpfulness"),
        RunEvalConfig.Criteria({
            "fifth-grader-score": "Do you have to be smarter than a fifth grader to answer this question?"
        }),
    ]
)

client = Client()
run_on_dataset(
    client,
    "<my_dataset_name>",
    construct_chain,
    evaluation=evaluation_config,
)

You can also create custom evaluators by subclassing the StringEvaluator or LangSmith’s RunEvaluator classes.

from typing import Optional
from langchain.evaluation import StringEvaluator

class MyStringEvaluator(StringEvaluator):

    @property
    def requires_input(self) -> bool:
        return False

    @property
    def requires_reference(self) -> bool:
        return True

    @property
    def evaluation_name(self) -> str:
        return "exact_match"

    def _evaluate_strings(self, prediction, reference=None, input=None, **kwargs) -> dict:
        return {"score": prediction == reference}

evaluation_config = RunEvalConfig(
    custom_evaluators = [MyStringEvaluator()],
)

run_on_dataset(
    client,
    "<my_dataset_name>",
    construct_chain,
    evaluation=evaluation_config,
)

Examples using run_on_dataset¶

LangSmith Walkthrough

langchain.smith.evaluation.runner_utils.run_on_dataset¶

Examples using run_on_dataset¶

`langchain.smith.evaluation.runner_utils`.run_on_dataset¶