inspect_flow
Types
FlowAgent
Configuration for an Agent.
class FlowAgent(BaseModel, extra="forbid")Attributes
namestr | None-
Name of the agent. Required to be set by the time the agent is created.
argsCreateArgs | None-
Additional args to pass to agent constructor.
flow_metadatadict[str, Any] | None-
Optional. Metadata stored in the flow config. Not passed to the agent.
typeLiteral['agent']-
Type needed to differentiated solvers and agents in solver lists.
FlowJob
Configuration for a flow job.
class FlowJob(BaseModel, extra="forbid")Attributes
includesSequence[str | FlowInclude] | None-
List of other flow configs to include.
log_dirstr | None-
Output path for logging results (required to ensure that a unique storage scope is assigned). Must be set before running the flow job. If a relative path, it will be resolved relative to the most recent config file loaded with ‘load_job’ or the current working directory if ‘load_job’ was not used.
log_dir_create_uniquebool | None-
If True, create a new log directory by appending an _ and numeric suffix if the specified log_dir already exists. If the directory exists and has a _numeric suffix, that suffix will be incremented. If False, use the existing log_dir (which must be empty or have log_dir_allow_dirty=True). Defaults to False.
python_versionstr | None-
Python version to use in the flow virtual environment (e.g. ‘3.11’)
optionsFlowOptions | None-
Arguments for calls to eval_set.
dependencieslist[str] | None-
Dependencies to pip install. E.g. PyPI package specifiers or Git repository URLs.
envdict[str, str] | None-
Environment variables to set when running tasks.
defaultsFlowDefaults | None-
Defaults values for Inspect objects.
flow_metadatadict[str, Any] | None-
Optional. Metadata stored in the flow config. Not passed to the model.
tasksSequence[str | FlowTask] | None-
Tasks to run
FlowDefaults
Default field values for Inspect objects. Will be overriden by more specific settings.
class FlowDefaults(BaseModel, extra="forbid")Attributes
configFlowGenerateConfig | None-
Default model generation options. Will be overriden by settings on the FlowModel and FlowTask.
agentFlowAgent | None-
Field defaults for agents.
agent_prefixdict[str, FlowAgent] | None-
Agent defaults for agent name prefixes. E.g. {‘inspect/’: FAgent(…)}
modelFlowModel | None-
Field defaults for models.
model_prefixdict[str, FlowModel] | None-
Model defaults for model name prefixes. E.g. {‘openai/’: FModel(…)}
solverFlowSolver | None-
Field defaults for solvers.
solver_prefixdict[str, FlowSolver] | None-
Solver defaults for solver name prefixes. E.g. {‘inspect/’: FSolver(…)}
taskFlowTask | None-
Field defaults for tasks.
task_prefixdict[str, FlowTask] | None-
Task defaults for task name prefixes. E.g. {‘inspect_evals/’: FTask(…)}
FlowEpochs
Configuration for task epochs.
Number of epochs to repeat samples over and optionally one or more reducers used to combine scores from samples across epochs. If not specified the “mean” score reducer is used.
class FlowEpochs(BaseModel)Attributes
epochsint-
Number of epochs.
reducerstr | list[str] | None-
One or more reducers used to combine scores from samples across epochs (defaults to “mean”)
FlowGenerateConfig
Model generation options.
class FlowGenerateConfig(GenerateConfig, extra="forbid")FlowModel
Configuration for a Model.
class FlowModel(BaseModel, extra="forbid")Attributes
namestr | None-
Name of the model to use. Required to be set by the time the model is created.
rolestr | None-
Optional named role for model (e.g. for roles specified at the task or eval level). Provide a default as a fallback in the case where the role hasn’t been externally specified.
defaultstr | None-
Optional. Fallback model in case the specified model or role is not found. Should be a fully qualified model name (e.g. openai/gpt-4o).
configFlowGenerateConfig | None-
Configuration for model. Config values will be override settings on the FlowTask and FlowJob.
base_urlstr | None-
Optional. Alternate base URL for model.
api_keystr | None-
Optional. API key for model.
memoizebool | None-
Use/store a cached version of the model based on the parameters to get_model(). Defaults to True.
model_argsCreateArgs | None-
Additional args to pass to model constructor.
flow_metadatadict[str, Any] | None-
Optional. Metadata stored in the flow config. Not passed to the model.
FlowOptions
Evaluation options.
class FlowOptions(BaseModel, extra="forbid")Attributes
retry_attemptsint | None-
Maximum number of retry attempts before giving up (defaults to 10).
retry_waitfloat | None-
Time to wait between attempts, increased exponentially (defaults to 30, resulting in waits of 30, 60, 120, 240, etc.). Wait time per-retry will in no case be longer than 1 hour.
retry_connectionsfloat | None-
Reduce max_connections at this rate with each retry (defaults to 1.0, which results in no reduction).
retry_cleanupbool | None-
Cleanup failed log files after retries (defaults to True).
sandboxSandboxEnvironmentType | None-
Sandbox environment type (or optionally a str or tuple with a shorthand spec).
sandbox_cleanupbool | None-
Cleanup sandbox environments after task completes (defaults to True).
tagslist[str] | None-
Tags to associate with this evaluation run.
metadatadict[str, Any] | None-
Metadata to associate with this evaluation run.
tracebool | None-
Trace message interactions with evaluated model to terminal.
displayDisplayType | None-
Task display type (defaults to ‘full’).
approvalstr | ApprovalPolicyConfig | None-
Tool use approval policies. Either a path to an approval policy config file or a list of approval policies. Defaults to no approval policy.
scorebool | None-
Score output (defaults to True).
log_levelstr | None-
Level for logging to the console: “debug”, “http”, “sandbox”, “info”, “warning”, “error”, “critical”, or “notset” (defaults to “warning”).
log_level_transcriptstr | None-
Level for logging to the log file (defaults to “info”).
log_formatLiteral['eval', 'json'] | None-
Format for writing log files (defaults to “eval”, the native high-performance format).
limitint | None-
Limit evaluated samples (defaults to all samples).
sample_shufflebool | int | None-
Shuffle order of samples (pass a seed to make the order deterministic).
fail_on_errorbool | float | None-
Trueto fail on first sample error(default);Falseto never fail on sample errors; Value between 0 and 1 to fail if a proportion of total samples fails. Value greater than 1 to fail eval if a count of samples fails. continue_on_failbool | None-
Trueto continue running and only fail at the end if thefail_on_errorcondition is met.Falseto fail eval immediately when thefail_on_errorcondition is met (default). retry_on_errorint | None-
Number of times to retry samples if they encounter errors (defaults to 3).
debug_errorsbool | None-
Raise task errors (rather than logging them) so they can be debugged (defaults to False).
max_samplesint | None-
Maximum number of samples to run in parallel (default is max_connections).
max_tasksint | None-
Maximum number of tasks to run in parallel (defaults is 10).
max_subprocessesint | None-
Maximum number of subprocesses to run in parallel (default is os.cpu_count()).
max_sandboxesint | None-
Maximum number of sandboxes (per-provider) to run in parallel.
log_samplesbool | None-
Log detailed samples and scores (defaults to True).
log_realtimebool | None-
Log events in realtime (enables live viewing of samples in inspect view) (defaults to True).
log_imagesbool | None-
Log base64 encoded version of images, even if specified as a filename or URL (defaults to False).
log_bufferint | None-
Number of samples to buffer before writing log file. If not specified, an appropriate default for the format and filesystem is chosen (10 for most all cases, 100 for JSON logs on remote filesystems).
log_sharedbool | int | None-
Sync sample events to log directory so that users on other systems can see log updates in realtime (defaults to no syncing). Specify
Trueto sync every 10 seconds, otherwise an integer to sync everynseconds. bundle_dirstr | None-
If specified, the log viewer and logs generated by this eval set will be bundled into this directory.
bundle_overwritebool | None-
Whether to overwrite files in the bundle_dir. (defaults to False).
log_dir_allow_dirtybool | None-
If True, allow the log directory to contain unrelated logs. If False, ensure that the log directory only contains logs for tasks in this eval set (defaults to False).
eval_set_idstr | None-
ID for the eval set. If not specified, a unique ID will be generated.
bundle_url_mapdict[str, str] | None-
Replacements applied to bundle_dir to generate a URL. If provided and bundle_dir is set, the mapped URL will be written to stdout.
FlowSolver
Configuration for a Solver.
class FlowSolver(BaseModel, extra="forbid")Attributes
namestr | None-
Name of the solver. Required to be set by the time the solver is created.
argsCreateArgs | None-
Additional args to pass to solver constructor.
flow_metadatadict[str, Any] | None-
Optional. Metadata stored in the flow config. Not passed to the solver.
FlowTask
Configuration for an evaluation task.
Tasks are the basis for defining and running evaluations.
class FlowTask(BaseModel, extra="forbid")Attributes
namestr | None-
Task name. Any of registry name (“inspect_evals/mbpp”), file name (“./my_task.py”), or a file name and attr (“./my_task.py@task_name”). Required to be set by the time the task is created.
argsCreateArgs | None-
Additional args to pass to task constructor
solverstr | FlowSolver | list[str | FlowSolver] | FlowAgent | None-
Solver or list of solvers. Defaults to generate(), a normal call to the model.
modelstr | FlowModel | None-
Default model for task (Optional, defaults to eval model).
configFlowGenerateConfig | None-
Model generation config for default model (does not apply to model roles). Will override config settings on the FlowJob. Will be overridden by settings on the FlowModel.
model_rolesModelRolesConfig | None-
Named roles for use in
get_model(). sandboxSandboxEnvironmentType | None-
Sandbox environment type (or optionally a str or tuple with a shorthand spec)
approvalstr | ApprovalPolicyConfig | None-
Tool use approval policies. Either a path to an approval policy config file or an approval policy config. Defaults to no approval policy.
epochsint | FlowEpochs | None-
Epochs to repeat samples for and optional score reducer function(s) used to combine sample scores (defaults to “mean”)
fail_on_errorbool | float | None-
Trueto fail on first sample error (default);Falseto never fail on sample errors; Value between 0 and 1 to fail if a proportion of total samples fails. Value greater than 1 to fail eval if a count of samples fails. continue_on_failbool | None-
Trueto continue running and only fail at the end if thefail_on_errorcondition is met.Falseto fail eval immediately when thefail_on_errorcondition is met (default). message_limitint | None-
Limit on total messages used for each sample.
token_limitint | None-
Limit on total tokens used for each sample.
time_limitint | None-
Limit on clock time (in seconds) for samples.
working_limitint | None-
Limit on working time (in seconds) for sample. Working time includes model generation, tool calls, etc. but does not include time spent waiting on retries or shared resources.
versionint | str | None-
Version of task (to distinguish evolutions of the task spec or breaking changes to it)
metadatadict[str, Any] | None-
Additional metadata to associate with the task.
sample_idstr | int | list[str | int] | None-
Evaluate specific sample(s) from the dataset.
flow_metadatadict[str, Any] | None-
Optional. Metadata stored in the flow config. Not passed to the task.
model_namestr | None-
Get the model name from the model field.
Returns: The model name if set, otherwise None.
Functions
agents_matrix
Create a list of agents from the product of lists of field values.
def agents_matrix(
*,
agent: str | FlowAgent | Sequence[str | FlowAgent],
**kwargs: Unpack[FlowAgentMatrixDict],
) -> list[FlowAgent]agents_with
Set fields on a list of agents.
def agents_with(
*,
agent: str | FlowAgent | Sequence[str | FlowAgent],
**kwargs: Unpack[FlowAgentDict],
) -> list[FlowAgent]configs_matrix
Create a list of generate configs from the product of lists of field values.
def configs_matrix(
*,
config: FlowGenerateConfig | Sequence[FlowGenerateConfig] | None = None,
**kwargs: Unpack[FlowGenerateConfigMatrixDict],
) -> list[FlowGenerateConfig]configFlowGenerateConfig | Sequence[FlowGenerateConfig] | None-
The config or list of configs to matrix.
**kwargsUnpack[FlowGenerateConfigMatrixDict]-
The lists of field values to matrix.
configs_with
Set fields on a list of generate configs.
def configs_with(
*,
config: FlowGenerateConfig | Sequence[FlowGenerateConfig],
**kwargs: Unpack[FlowGenerateConfigDict],
) -> list[FlowGenerateConfig]configFlowGenerateConfig | Sequence[FlowGenerateConfig]-
The config or list of configs to set fields on.
**kwargsUnpack[FlowGenerateConfigDict]-
The fields to set on each config.
merge
Merge two flow objects.
def merge(base: _T, add: _T) -> _Tbase_T-
The base object.
add_T-
The object to merge into the base. Values in this object will override those in the base.
models_matrix
Create a list of models from the product of lists of field values.
def models_matrix(
*,
model: str | FlowModel | Sequence[str | FlowModel],
**kwargs: Unpack[FlowModelMatrixDict],
) -> list[FlowModel]models_with
Set fields on a list of models.
def models_with(
*,
model: str | FlowModel | Sequence[str | FlowModel],
**kwargs: Unpack[FlowModelDict],
) -> list[FlowModel]solvers_matrix
Create a list of solvers from the product of lists of field values.
def solvers_matrix(
*,
solver: str | FlowSolver | Sequence[str | FlowSolver],
**kwargs: Unpack[FlowSolverMatrixDict],
) -> list[FlowSolver]solverstr | FlowSolver | Sequence[str | FlowSolver]-
The solver or list of solvers to matrix.
**kwargsUnpack[FlowSolverMatrixDict]-
The lists of field values to matrix.
solvers_with
Set fields on a list of solvers.
def solvers_with(
*,
solver: str | FlowSolver | Sequence[str | FlowSolver],
**kwargs: Unpack[FlowSolverDict],
) -> list[FlowSolver]solverstr | FlowSolver | Sequence[str | FlowSolver]-
The solver or list of solvers to set fields on.
**kwargsUnpack[FlowSolverDict]-
The fields to set on each solver.
tasks_matrix
Create a list of tasks from the product of lists of field values.
def tasks_matrix(
*,
task: str | FlowTask | Sequence[str | FlowTask],
**kwargs: Unpack[FlowTaskMatrixDict],
) -> list[FlowTask]tasks_with
Set fields on a list of tasks.
def tasks_with(
*,
task: str | FlowTask | Sequence[str | FlowTask],
**kwargs: Unpack[FlowTaskDict],
) -> list[FlowTask]