inspect_flow
Types
FlowAgent
Configuration for an Agent.
class FlowAgent(FlowBase)Attributes
namestr | None | NotGiven-
Name of the agent. Used to create the agent if the factory is not provided.
factoryCallable[..., Agent] | None | NotGiven-
Factory function to create the agent instance.
argsCreateArgs | None | NotGiven-
Additional args to pass to agent constructor.
flow_metadatadict[str, Any] | None | NotGiven-
Optional. Metadata stored in the flow config. Not passed to the agent.
typeLiteral['agent'] | None-
Type needed to differentiated solvers and agents in solver lists.
FlowDefaults
Default field values for Inspect objects. Will be overriden by more specific settings.
class FlowDefaults(FlowBase)Attributes
configGenerateConfig | None | NotGiven-
Default model generation options. Will be overriden by settings on the FlowModel and FlowTask.
agentFlowAgent | None | NotGiven-
Field defaults for agents.
agent_prefixdict[str, FlowAgent] | None | NotGiven-
Agent defaults for agent name prefixes. E.g. {‘inspect/’: FAgent(…)}
modelFlowModel | None | NotGiven-
Field defaults for models.
model_prefixdict[str, FlowModel] | None | NotGiven-
Model defaults for model name prefixes. E.g. {‘openai/’: FModel(…)}
solverFlowSolver | None | NotGiven-
Field defaults for solvers.
solver_prefixdict[str, FlowSolver] | None | NotGiven-
Solver defaults for solver name prefixes. E.g. {‘inspect/’: FSolver(…)}
taskFlowTask | None | NotGiven-
Field defaults for tasks.
task_prefixdict[str, FlowTask] | None | NotGiven-
Task defaults for task name prefixes. E.g. {‘inspect_evals/’: FTask(…)}
FlowDependencies
Configuration for flow dependencies to install in the venv.
class FlowDependencies(FlowBase)Attributes
dependency_fileLiteral['auto', 'no_file'] | str | None | NotGiven-
Path to a dependency file (either requirements.txt or pyproject.toml) to use to create the virtual environment. If ‘auto’, will search the path starting from the same directory as the config file (when using the CLI) or base_dir arg (when using the API) looking for pyproject.toml or requirements.txt files. If ‘no_file’, no dependency file will be used. Defaults to ‘auto’.
additional_dependenciesstr | Sequence[str] | None | NotGiven-
Dependencies to pip install. E.g. PyPI package specifiers or Git repository URLs.
auto_detect_dependenciesbool | None | NotGiven-
If True, automatically detect and install dependencies from names of objects in the config (defaults to True). For example, if a model name starts with ‘openai/’, the ‘openai’ package will be installed. If a task name is ‘inspect_evals/mmlu’ then the ‘inspect-evals’ package will be installed.
uv_sync_argsstr | Sequence[str] | None | NotGiven-
Additional arguments to pass to ‘uv sync’ when creating the virtual environment using a pyproject.toml file. May be a string (‘–dev –extra test’) or a list of strings ([‘–dev’, ‘–extra’, ‘test’]).
FlowEpochs
Configuration for task epochs.
Number of epochs to repeat samples over and optionally one or more reducers used to combine scores from samples across epochs. If not specified the “mean” score reducer is used.
class FlowEpochs(FlowBase)Attributes
epochsint-
Number of epochs.
reducerstr | Sequence[str] | None | NotGiven-
One or more reducers used to combine scores from samples across epochs (defaults to “mean”)
FlowSpec
Top-level flow specification.
class FlowSpec(FlowBase, arbitrary_types_allowed=True)Attributes
includesSequence[str | FlowSpec] | None | NotGiven-
List of other flow specs to include. Relative paths will be resolved relative to the config file (when using the CLI) or base_dir arg (when using the API). In addition to this list of explicit files to include, any _flow.py files in the same directory or any parent directory of the config file (when using the CLI) or base_dir arg (when using the API) will also be included automatically.
log_dirstr | None | NotGiven-
Output path for logging results (required to ensure that a unique storage scope is assigned). Must be set before running the flow spec. Relative paths will be resolved relative to the config file (when using the CLI) or base_dir arg (when using the API).
log_dir_create_uniquebool | None | NotGiven-
If True, create a new log directory by appending an _ and numeric suffix if the specified log_dir already exists. If the directory exists and has a _numeric suffix, that suffix will be incremented. If False, use the existing log_dir (which must be empty or have log_dir_allow_dirty=True). Defaults to False.
execution_typeLiteral['inproc', 'venv'] | None | NotGiven-
Execution environment for running tasks (defaults to ‘inproc’).
python_versionstr | None | NotGiven-
Python version to use in the flow virtual environment (e.g. ‘3.11’)
dependenciesFlowDependencies | None | NotGiven-
Dependencies to install in the venv. Defaults to auto-detecting dependencies from pyproject.toml, requirements.txt, and object names in the config.
optionsFlowOptions | None | NotGiven-
Arguments for calls to eval_set.
envdict[str, str] | None | NotGiven-
Environment variables to set when running tasks.
defaultsFlowDefaults | None | NotGiven-
Defaults values for Inspect objects.
flow_metadatadict[str, Any] | None | NotGiven-
Optional. Metadata stored in the flow config. Not passed to the model.
tasksSequence[str | FlowTask | Task] | None | NotGiven-
Tasks to run
FlowModel
Configuration for a Model.
class FlowModel(FlowBase)Attributes
namestr | None | NotGiven-
Name of the model to use. If factory is not provided, this is used to create the model.
factoryCallable[..., Model] | None | NotGiven-
Factory function to create the model instance.
rolestr | None | NotGiven-
Optional named role for model (e.g. for roles specified at the task or eval level). Provide a default as a fallback in the case where the role hasn’t been externally specified.
defaultstr | None | NotGiven-
Optional. Fallback model in case the specified model or role is not found. Should be a fully qualified model name (e.g. openai/gpt-4o).
configGenerateConfig | None | NotGiven-
Configuration for model. Config values will be override settings on the FlowTask and FlowSpec.
base_urlstr | None | NotGiven-
Optional. Alternate base URL for model.
api_keystr | None | NotGiven-
Optional. API key for model.
memoizebool | None | NotGiven-
Use/store a cached version of the model based on the parameters to get_model(). Defaults to True.
model_argsCreateArgs | None | NotGiven-
Additional args to pass to model constructor.
flow_metadatadict[str, Any] | None | NotGiven-
Optional. Metadata stored in the flow config. Not passed to the model.
FlowOptions
Evaluation options.
class FlowOptions(FlowBase)Attributes
retry_attemptsint | None | NotGiven-
Maximum number of retry attempts before giving up (defaults to 10).
retry_waitfloat | None | NotGiven-
Time to wait between attempts, increased exponentially (defaults to 30, resulting in waits of 30, 60, 120, 240, etc.). Wait time per-retry will in no case be longer than 1 hour.
retry_connectionsfloat | None | NotGiven-
Reduce max_connections at this rate with each retry (defaults to 1.0, which results in no reduction).
retry_cleanupbool | None | NotGiven-
Cleanup failed log files after retries (defaults to True).
sandboxSandboxEnvironmentType | None | NotGiven-
Sandbox environment type (or optionally a str or tuple with a shorthand spec).
sandbox_cleanupbool | None | NotGiven-
Cleanup sandbox environments after task completes (defaults to True).
tagsSequence[str] | None | NotGiven-
Tags to associate with this evaluation run.
metadatadict[str, Any] | None | NotGiven-
Metadata to associate with this evaluation run.
tracebool | None | NotGiven-
Trace message interactions with evaluated model to terminal.
displayDisplayType | None | NotGiven-
Task display type (defaults to ‘full’).
approvalstr | ApprovalPolicyConfig | None | NotGiven-
Tool use approval policies. Either a path to an approval policy config file or a list of approval policies. Defaults to no approval policy.
scorebool | None | NotGiven-
Score output (defaults to True).
log_levelstr | None | NotGiven-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”).
log_level_transcriptstr | None | NotGiven-
Level for logging to the log file (defaults to “info”).
log_formatLiteral['eval', 'json'] | None | NotGiven-
Format for writing log files (defaults to “eval”, the native high-performance format).
limitint | None | NotGiven-
Limit evaluated samples (defaults to all samples).
sample_shufflebool | int | None | NotGiven-
Shuffle order of samples (pass a seed to make the order deterministic).
fail_on_errorbool | float | None | NotGiven-
Trueto fail on first sample error(default);Falseto never fail on sample errors; Value between 0 and 1 to fail if a proportion of total samples fails. Value greater than 1 to fail eval if a count of samples fails. continue_on_failbool | None | NotGiven-
Trueto continue running and only fail at the end if thefail_on_errorcondition is met.Falseto fail eval immediately when thefail_on_errorcondition is met (default). retry_on_errorint | None | NotGiven-
Number of times to retry samples if they encounter errors (defaults to 3).
debug_errorsbool | None | NotGiven-
Raise task errors (rather than logging them) so they can be debugged (defaults to False).
max_samplesint | None | NotGiven-
Maximum number of samples to run in parallel (default is max_connections).
max_tasksint | None | NotGiven-
Maximum number of tasks to run in parallel (defaults is 10).
max_subprocessesint | None | NotGiven-
Maximum number of subprocesses to run in parallel (default is os.cpu_count()).
max_sandboxesint | None | NotGiven-
Maximum number of sandboxes (per-provider) to run in parallel.
log_samplesbool | None | NotGiven-
Log detailed samples and scores (defaults to True).
log_realtimebool | None | NotGiven-
Log events in realtime (enables live viewing of samples in inspect view) (defaults to True).
log_imagesbool | None | NotGiven-
Log base64 encoded version of images, even if specified as a filename or URL (defaults to False).
log_bufferint | None | NotGiven-
Number of samples to buffer before writing log file. If not specified, an appropriate default for the format and filesystem is chosen (10 for most all cases, 100 for JSON logs on remote filesystems).
log_sharedbool | int | None | NotGiven-
Sync sample events to log directory so that users on other systems can see log updates in realtime (defaults to no syncing). Specify
Trueto sync every 10 seconds, otherwise an integer to sync everynseconds. bundle_dirstr | None | NotGiven-
If specified, the log viewer and logs generated by this eval set will be bundled into this directory. Relative paths will be resolved relative to the config file (when using the CLI) or base_dir arg (when using the API).
bundle_overwritebool | None | NotGiven-
Whether to overwrite files in the bundle_dir. (defaults to False).
log_dir_allow_dirtybool | None | NotGiven-
If True, allow the log directory to contain unrelated logs. If False, ensure that the log directory only contains logs for tasks in this eval set (defaults to False).
eval_set_idstr | None | NotGiven-
ID for the eval set. If not specified, a unique ID will be generated.
bundle_url_mappingsdict[str, str] | None | NotGiven-
Replacements applied to bundle_dir to generate a URL. If provided and bundle_dir is set, the mapped URL will be written to stdout.
FlowScorer
Configuration for a Scorer.
class FlowScorer(FlowBase)Attributes
namestr | None | NotGiven-
Name of the scorer. Used to create the scorer if the factory is not provided.
factoryCallable[..., Scorer] | None | NotGiven-
Factory function to create the scorer instance.
argsCreateArgs | None | NotGiven-
Additional args to pass to scorer constructor.
flow_metadatadict[str, Any] | None | NotGiven-
Optional. Metadata stored in the flow config. Not passed to the scorer.
FlowSolver
Configuration for a Solver.
class FlowSolver(FlowBase)Attributes
namestr | None | NotGiven-
Name of the solver. Used to create the solver if the factory is not provided.
factoryCallable[..., Solver] | None | NotGiven-
Factory function to create the solver instance.
argsCreateArgs | None | NotGiven-
Additional args to pass to solver constructor.
flow_metadatadict[str, Any] | None | NotGiven-
Optional. Metadata stored in the flow config. Not passed to the solver.
FlowTask
Configuration for an evaluation task.
Tasks are the basis for defining and running evaluations.
class FlowTask(FlowBase, arbitrary_types_allowed=True)Attributes
namestr | None | NotGiven-
Task name. Any of registry name (“inspect_evals/mbpp”), file name (“./my_task.py”), or a file name and attr (“./my_task.py@task_name”). Used to create the task if the factory is not provided.
factoryCallable[..., Task] | None | NotGiven-
Factory function to create the task instance.
argsCreateArgs | None | NotGiven-
Additional args to pass to task constructor
extra_argsFlowExtraArgs | None | NotGiven-
Extra args to provide to creation of inspect objects for this task. Will override args provided in the ‘args’ field on the FlowModel, FlowSolver, FlowScorer, and FlowAgent.
solverstr | FlowSolver | FlowAgent | Solver | Agent | Sequence[str | FlowSolver | Solver] | None | NotGiven-
Solver or list of solvers. Defaults to generate(), a normal call to the model.
scorerstr | FlowScorer | Scorer | Sequence[str | FlowScorer | Scorer] | None | NotGiven-
Scorer or list of scorers used to evaluate model output.
modelstr | FlowModel | Model | None | NotGiven-
Default model for task (Optional, defaults to eval model).
configGenerateConfig | NotGiven-
Model generation config for default model (does not apply to model roles). Will override config settings on the FlowSpec. Will be overridden by settings on the FlowModel.
model_rolesModelRolesConfig | None | NotGiven-
Named roles for use in
get_model(). sandboxSandboxEnvironmentType | None | NotGiven-
Sandbox environment type (or optionally a str or tuple with a shorthand spec)
approvalstr | ApprovalPolicyConfig | None | NotGiven-
Tool use approval policies. Either a path to an approval policy config file or an approval policy config. Defaults to no approval policy.
epochsint | FlowEpochs | None | NotGiven-
Epochs to repeat samples for and optional score reducer function(s) used to combine sample scores (defaults to “mean”)
fail_on_errorbool | float | None | NotGiven-
Trueto fail on first sample error (default);Falseto never fail on sample errors; Value between 0 and 1 to fail if a proportion of total samples fails. Value greater than 1 to fail eval if a count of samples fails. continue_on_failbool | None | NotGiven-
Trueto continue running and only fail at the end if thefail_on_errorcondition is met.Falseto fail eval immediately when thefail_on_errorcondition is met (default). message_limitint | None | NotGiven-
Limit on total messages used for each sample.
token_limitint | None | NotGiven-
Limit on total tokens used for each sample.
time_limitint | None | NotGiven-
Limit on clock time (in seconds) for samples.
working_limitint | None | NotGiven-
Limit on working time (in seconds) for sample. Working time includes model generation, tool calls, etc. but does not include time spent waiting on retries or shared resources.
versionint | str | NotGiven-
Version of task (to distinguish evolutions of the task spec or breaking changes to it)
metadatadict[str, Any] | None | NotGiven-
Additional metadata to associate with the task.
sample_idstr | int | Sequence[str | int] | None | NotGiven-
Evaluate specific sample(s) from the dataset.
flow_metadatadict[str, Any] | None | NotGiven-
Optional. Metadata stored in the flow config. Not passed to the task.
model_namestr | None | NotGiven-
Get the model name from the model field.
Returns: The model name if set, otherwise None.
Decorators
after_load
Decorator to mark a function to be called after a FlowSpec is loaded.
The decorated function should have the signature (args are all optional and may be omitted): def after_flow_spec_loaded( spec: FlowSpec, files: list[str], ) -> None:
spec: The loaded FlowSpec.
files: List of file paths that were loaded to create the FlowSpec.
…
def after_load(func: Callable) -> CallablefuncCallable-
The function to decorate.
Functions
agents_matrix
Create a list of agents from the product of lists of field values.
def agents_matrix(
*,
agent: str | FlowAgent | Sequence[str | FlowAgent],
**kwargs: Unpack[FlowAgentMatrixDict],
) -> list[FlowAgent]agents_with
Set fields on a list of agents.
def agents_with(
*,
agent: str | FlowAgent | Sequence[str | FlowAgent],
**kwargs: Unpack[FlowAgentDict],
) -> list[FlowAgent]configs_matrix
Create a list of generate configs from the product of lists of field values.
def configs_matrix(
*,
config: GenerateConfig | Sequence[GenerateConfig] | None = None,
**kwargs: Unpack[GenerateConfigMatrixDict],
) -> list[GenerateConfig]configGenerateConfig | Sequence[GenerateConfig] | None-
The config or list of configs to matrix.
**kwargsUnpack[GenerateConfigMatrixDict]-
The lists of field values to matrix.
configs_with
Set fields on a list of generate configs.
def configs_with(
*,
config: GenerateConfig | Sequence[GenerateConfig],
**kwargs: Unpack[GenerateConfigDict],
) -> list[GenerateConfig]configGenerateConfig | Sequence[GenerateConfig]-
The config or list of configs to set fields on.
**kwargsUnpack[GenerateConfigDict]-
The fields to set on each config.
merge
Merge two flow objects.
def merge(base: _T, add: _T) -> _Tbase_T-
The base object.
add_T-
The object to merge into the base. Values in this object will override those in the base.
models_matrix
Create a list of models from the product of lists of field values.
def models_matrix(
*,
model: str | FlowModel | Sequence[str | FlowModel],
**kwargs: Unpack[FlowModelMatrixDict],
) -> list[FlowModel]models_with
Set fields on a list of models.
def models_with(
*,
model: str | FlowModel | Sequence[str | FlowModel],
**kwargs: Unpack[FlowModelDict],
) -> list[FlowModel]solvers_matrix
Create a list of solvers from the product of lists of field values.
def solvers_matrix(
*,
solver: str | FlowSolver | Sequence[str | FlowSolver],
**kwargs: Unpack[FlowSolverMatrixDict],
) -> list[FlowSolver]solverstr | FlowSolver | Sequence[str | FlowSolver]-
The solver or list of solvers to matrix.
**kwargsUnpack[FlowSolverMatrixDict]-
The lists of field values to matrix.
solvers_with
Set fields on a list of solvers.
def solvers_with(
*,
solver: str | FlowSolver | Sequence[str | FlowSolver],
**kwargs: Unpack[FlowSolverDict],
) -> list[FlowSolver]solverstr | FlowSolver | Sequence[str | FlowSolver]-
The solver or list of solvers to set fields on.
**kwargsUnpack[FlowSolverDict]-
The fields to set on each solver.
tasks_matrix
Create a list of tasks from the product of lists of field values.
def tasks_matrix(
*,
task: str | FlowTask | Sequence[str | FlowTask],
**kwargs: Unpack[FlowTaskMatrixDict],
) -> list[FlowTask]tasks_with
Set fields on a list of tasks.
def tasks_with(
*,
task: str | FlowTask | Sequence[str | FlowTask],
**kwargs: Unpack[FlowTaskDict],
) -> list[FlowTask]