Flow Concepts

The primary interface for defining a workflow in Flow is a FlowSpec. A FlowSpec defines the log directory, tasks, models, and other options that make up the spec. When Inspect Flow processes the FlowSpec, it resolves the configuration into a list of evaluation tasks to be run. These tasks are provided to Inspect’s eval_set() function along with any additional configuration, and Inspect executes the tasks.

Inspect Flow mirrors Inspect AI’s object model with corresponding Flow types:

FlowSpec — The top-level spec definition. Contains a list of tasks and spec settings. Under the hood it translates to an Inspect AI Eval Set.
FlowTask — Configuration for a single evaluation task. Maps to Inspect AI Task parameters.
FlowModel — Model configuration including API settings and generation settings. Maps to Inspect AI Model.
FlowSolver and FlowAgent — Solver and agent chain configuration. Map to Inspect AI Solver and Agent.
FlowScorer — Configuration for a Scorer. Maps to Inspect AI Scorer.
FlowOptions — Runtime execution options. Maps to Inspect AI eval_set() parameters.
FlowDefaults — System for setting default values across tasks, models, solvers, and agents.
FlowDependencies — Configuration for evaluation dependencies (used in virtual environment mode).

Config Files

Flow configuration files are Python files where the last expression is either:

A FlowSpec object, or
A function that returns a FlowSpec object

Since config files are standard Python code, you can use all Python features including variables, loops, comprehensions, imports, and comments. Flow configs are just Python!

Note

This last expression requirement only applies when loading configs from files (via CLI or load_spec()). When using the Python API directly with run(), you pass the FlowSpec object programmatically—no file or last expression needed.

Basic Example

In Welcome - Basic Example, we showed a simple FlowSpec. Let’s break down what’s happening:

config.py

from inspect_flow import FlowSpec, FlowTask

FlowSpec(
    log_dir="logs",
    tasks=[
        FlowTask(
            name="inspect_evals/gpqa_diamond",
            model="openai/gpt-5",
        ),
    ],
)

1: Specify the directory for storing logs
2: List evaluation tasks to run
3: Specify task from registry by name
4: Specify model to evaluate by name

To run the task, first ensure you have the necessary dependencies installed (inspect-evals and openai for this example), then run the following command in your shell.

flow run config.py

Alternatively, use the --venv flag to automatically install dependencies in an isolated virtual environment:

flow run config.py --venv

What happens when you run this (default in-process mode)?

Flow loads the gpqa_diamond task from the inspect_evals registry
Runs the evaluation with GPT-5 in the current Python process
Stores results in logs/

When using --venv, Flow additionally creates an isolated virtual environment, installs dependencies automatically, executes the evaluation in that environment, and cleans up the temporary environment after completion.

Jobs

FlowSpec is the primary interface for defining Flow workflows. All Flow operations—including parameter sweeps and matrix expansions—ultimately produce a list of tasks that FlowSpec executes.

Required fields:

Field	Description
`log_dir`	Output path for logging results. Supports S3 paths (e.g., `s3://bucket/path`)

Optional fields:

Field	Description	Default
`tasks`	List of tasks to run (FlowTask objects, Inspect AI `Task` objects, or strings)	`None`
`includes`	List of other flow configs to include (paths as strings). Relative paths resolved relative to config file (CLI) or `base_dir` arg (API). Additionally, any `_flow.py` files in the same directory or parent directories are automatically included	`None`
`log_dir_create_unique`	If `True`, append numeric suffix to `log_dir` if it exists. If `False`, use existing directory (must only contain logs from tasks in the spec or have `options.log_dir_allow_dirty=True`)	`False`
`execution_type`	Execution environment: `"inproc"` runs tasks in the current Python process, `"venv"` runs tasks in an isolated virtual environment	`"inproc"`
`python_version`	Python version for the isolated virtual environment when `execution_type="venv"` (e.g., `"3.11"`)	Same as current environment
`dependencies`	FlowDependencies object configuring package installation in venv mode. Controls dependency file detection, additional packages, and auto-detection behavior. Ignored in inproc mode.	Auto-detect from `pyproject.toml` or `requirements.txt`, and config object names (venv mode only)
`env`	Environment variables to set when running tasks	`None`
`options`	Runtime options passed to `eval_set` (see FlowOptions reference)	`None`
`defaults`	Default values applied across tasks, models, solvers, and agents (see Defaults and FlowDefaults reference)	`None`
`flow_metadata`	Metadata stored in the flow config (not passed to Inspect AI, see flow_metadata)	`None`

Tasks

FlowTask defines the configuration for a single evaluation. Each FlowTask maps to an Inspect AI Task and accepts the same parameters—including the model to evaluate, solver chain, generation config, sandbox environment, and dataset filters.

Specification

Flow supports multiple ways to specify tasks, from simple registry names to programmatic Task objects. Choose the approach that best fits your needs:

Registry names and file paths: Simple string-based references for tasks from packages or local files
Task factory functions: Use FlowTask(factory=...) when you have a Task object but want to override parameters like model or config
Direct Task objects: Pass Task objects directly to FlowSpec.tasks when you don’t need Flow-level parameter overrides

Registry Name

# Tasks from installed packages
FlowTask(
    name="inspect_evals/gpqa_diamond",
    model="openai/gpt-5"
)

File Path

# Auto-discovers `@task` decorated functions in the specified file and creates a task for each of them
FlowTask(
    name="./my_task.py",
    model="openai/gpt-5"
)

File with Function

# Explicitly selects a specific function from the file
FlowTask(
    name="./my_task.py@custom_eval",
    model="openai/gpt-5"
)

Task Factory Function

# Use a factory function that returns an Inspect AI Task
from inspect_ai import Task, task
from inspect_ai.dataset import example_dataset
from inspect_ai.solver import generate
from inspect_flow import FlowTask

@task  # @task decorator makes the function available in the registry and ensures that tasks with different args are considered unique
def my_task_factory():
    return Task(dataset=example_dataset("security_guide"), solver=generate())

FlowTask(
    factory=my_task_factory,
    model="openai/gpt-5",  # Override the model
)

Direct Task Object in FlowSpec

direct_task_object.py

# Pass Task objects directly to FlowSpec.tasks
# Use this when you don't need to override task parameters
from inspect_ai import Task, task
from inspect_ai.dataset import example_dataset
from inspect_ai.solver import generate
from inspect_flow import FlowSpec


@task  # Optional: use @task decorator to make function available in the registry
def my_custom_task():
    return Task(
        dataset=example_dataset("security_guide"),
        solver=generate(),
    )


FlowSpec(
    log_dir="logs",
    tasks=[
        my_custom_task(),
        "inspect_evals/gpqa_diamond",
    ],
)

Configuration

FlowTask accepts parameters that map to Inspect AI Task fields. The examples below show commonly used fields; see the FlowTask reference documentation for the complete list of available parameters.

FlowTask(
    name="inspect_evals/mmlu_0_shot",
    model="openai/gpt-5",
    epochs=3,
    config=GenerateConfig(
        temperature=0.7,
        max_tokens=1000,
    ),
    solver="chain_of_thought",
    args={"subject": "physics"},
    sandbox="docker",
    sample_id=[0, 1, 2],
    extra_args=FlowExtraArgs(
        solver={"max_attempts": 3}
    ),
)

1: Task name or factory — Use name to specify a task by registry name ("inspect_evals/mmlu"), file path ("./task.py"), or file with function ("./task.py@eval_fn"). Alternatively, use factory to provide a callable that returns an Inspect AI Task object (see Task Factory Function example above). When using factory, you can still override task parameters like model, solver, or config via other FlowTask fields.
2: Model — Maps to Inspect AI Task.model. Optional model for this task. If not specified, uses the model from INSPECT_EVAL_MODEL environment variable. Can be a string ("openai/gpt-5"), a FlowModel object for advanced configuration, or an Inspect AI Model object.
3: Epochs — Maps to Inspect AI Task.epochs. Number of times to repeat evaluation over the dataset samples. Can be an integer (epochs=3) or a FlowEpochs object to specify custom reducer functions (FlowEpochs(epochs=3, reducer="median")). By default, scores are combined using the "mean" reducer across epochs.
4: Generation config — Maps to Inspect AI Task.config (GenerateConfig). Model generation parameters like temperature, max_tokens, top_p, reasoning_effort, etc. These settings override config on FlowSpec.config but are overridden by settings on FlowModel.config.
5: Solver chain — Maps to Inspect AI Task.solver. The algorithm(s) for solving the task. Can be a string ("chain_of_thought"), FlowSolver object, FlowAgent object, Inspect AI Solver or Agent object, or a list of solvers for chaining. Defaults to generate() if not specified.
6: Task arguments — Maps to task function parameters. Dictionary of arguments passed to the task constructor or @task decorated function. Enables parameterization of tasks (e.g., selecting dataset subsets, configuring difficulty levels).
7: Sandbox environment — Maps to Inspect AI Task.sandbox. Can be a string ("docker", "local"), a tuple with additional config, or a SandboxEnvironmentType object.
8: Sample selection — Evaluate specific samples from the dataset. Accepts a single ID (sample_id=0), list of IDs (sample_id=[0, 1, 2]), or list of string IDs.
9: Extra arguments — Additional arguments to pass when creating Inspect AI objects (models, solvers, agents, scorers) for this specific task. Useful for per-task customization, such as providing different tools to an agent depending on the task. Values in extra_args override any args specified in the object’s own args field.

Models

When specifying a task, you can provide the model for the task as a string with a model name, a FlowModel, or an Inspect AI Model object. When using the simple string, the default values for that model will be used. For example:

FlowTask(name="task", model="openai/gpt-5")

Using FlowModel allows you to specify additional parameters for the model (e.g. GenerateConfig). Use FlowModel when you need to set custom GenerateConfig, API endpoints, custom API keys for a specific model, or perform other custom configuration. For example:

FlowTask(
    name="task",
    model=FlowModel(
        name="openai/gpt-5",
        config=GenerateConfig(
            reasoning_effort="medium",
            max_connections=10,
        ),
        base_url="https://custom-endpoint.com",
        api_key="${CUSTOM_API_KEY}",
    )
)

You can also pass an Inspect AI Model object directly when you need programmatic model configuration:

from inspect_ai.model import get_model

# Create a Model instance with custom configuration
custom_model = get_model(
    "openai/gpt-5",
    config=GenerateConfig(reasoning_effort="high"),
    base_url="https://custom-endpoint.com"
)

FlowTask(
    name="task",
    model=custom_model
)

Model Roles

For agent evaluations with multiple model roles, use model_roles to specify which model to use for each role. For example:

# For agent evaluations with multiple roles
FlowTask(
    name="multi_agent_task",
    model_roles={
        "assistant": "openai/gpt-5",
        "critic": "anthropic/claude-3-5-sonnet",
    }
)