flow step

Run workflow steps on eval logs.

Run workflow steps on eval logs.

Steps are discovered from built-in steps, _flow.py files in the current directory tree, and Python entry points.

You can also load steps from an arbitrary Python file:

flow step file.py --help          List steps in a file

flow step file.py STEP [ARGS]     Run a step from a file

flow step file.py@STEP [ARGS]     Shorthand for the above

Usage

flow step [OPTIONS] COMMAND [ARGS]...

Subcommands

copy Copy eval logs to a destination directory.
metadata Set or delete metadata fields on eval logs.
scan Run Inspect Scout scanners against the transcripts of eval logs.
tag Add or remove tags on eval logs.

flow step copy

Copy eval logs to a destination directory.

Usage

flow step copy [OPTIONS] [PATH]...

Options

Name Type Description Default
--dest text Destination directory (local or S3). _required
--source-prefix text Directory prefix to strip from source paths. Without this option, files are copied flat into the destination. When provided, preserves directory structure relative to the prefix. None
--overwrite boolean Overwrite existing files at the destination. False
--store text Resolve logs from a store. Use –store for the default store or –store PATH for a specific one. None
--filter text Log filter. Only process logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times (all must pass). None
--exclude text Log filter. Exclude logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times. None
--recursive / --no-recursive boolean Recurse into directories (default: true). No effect when –store is used. True
--dry-run boolean Preview changes without writing to disk. False
--display choice (full | rich | plain) Set the display mode (defaults to 'full'). full
--log-level choice (debug | trace | http | info | warning | error | critical | notset) Set the log level (defaults to 'warning'). warning
--help boolean Show this message and exit. False

flow step metadata

Set or delete metadata fields on eval logs.

Usage

flow step metadata [OPTIONS] [PATH]...

Options

Name Type Description Default
--set text Key-value pairs to set. ()
--remove text Keys to delete. ()
--author text Provenance author. Defaults to git user. None
--reason text Reason for the edit. None
--store text Resolve logs from a store. Use –store for the default store or –store PATH for a specific one. None
--filter text Log filter. Only process logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times (all must pass). None
--exclude text Log filter. Exclude logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times. None
--recursive / --no-recursive boolean Recurse into directories (default: true). No effect when –store is used. True
--dry-run boolean Preview changes without writing to disk. False
--display choice (full | rich | plain) Set the display mode (defaults to 'full'). full
--log-level choice (debug | trace | http | info | warning | error | critical | notset) Set the log level (defaults to 'warning'). warning
--help boolean Show this message and exit. False

flow step scan

Run Inspect Scout scanners against the transcripts of eval logs.

Usage

flow step scan [OPTIONS] [PATH]...

Options

Name Type Description Default
--scanners text Scanners to run, as a sequence, dict, ScanJob, ScanJobConfig, or a path to a Python/YAML file containing scanjob/scanner definitions. _required
-S text One or more scanjob or scanner arguments (e.g. -S arg=value) None
--scans text Location to write scan results to. Defaults to a ‘scans’ directory alongside the input logs (errors if logs are in multiple directories). None
-V, --validation text One or more validation sets to apply for scanners (e.g. -V myscanner:deception.csv) None
--model text Model used by default for llm scanners. None
--model-base-url text Base URL for for model API None
-M text One or more native model arguments (e.g. -M arg=value) None
--model-config text YAML or JSON config file with model arguments. None
--model-role text Named model role with model name or YAML/JSON config, e.g. –model-role critic=openai/gpt-4o or –model-role grader=“{model: mockllm/model, temperature: 0.5}” None
--max-transcripts integer Maximum number of transcripts to scan concurrently (defaults to 25) None
--max-processes integer Number of worker processes. Defaults to 4. None
--limit integer Limit number of transcripts to scan. None
--shuffle text Shuffle order of transcripts (pass a seed to make the order deterministic) None
--tags text Tags to associate with this scan job (comma separated) None
--metadata text Metadata to associate with this scan job (more than one –metadata argument can be specified). None
--cache text Policy for caching of model generations. Specify –cache to cache with 7 day expiration (7D). Specify an explicit duration (e.g. (e.g. 1h, 3d, 6M) to set the expiration explicitly (durations can be expressed as s, m, h, D, W, M, or Y). Alternatively, pass the file path to a YAML or JSON config file with a full CachePolicy configuration. None
--batch text Batch requests together to reduce API calls when using a model that supports batching (by default, no batching). Specify –batch to batch with default configuration, specify a batch size e.g. --batch=1000 to configure batches of 1000 requests, or pass the file path to a YAML or JSON config file with batch configuration. None
--max-connections integer Maximum number of concurrent connections to Model API (defaults to max_transcripts) None
--max-retries integer Maximum number of times to retry model API requests (defaults to unlimited) None
--timeout integer Model API request timeout in seconds (defaults to no timeout) None
--max-tokens integer The maximum number of tokens that can be generated in the completion (default is model specific) None
--temperature float What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. None
--top-p float An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. None
--top-k integer Randomly sample the next word from the top_k most likely next words. Anthropic, Google, HuggingFace, and vLLM only. None
--reasoning-effort choice (minimal | low | medium | high) Constrains effort on reasoning for reasoning models (defaults to medium). Open AI o-series and gpt-5 models only. None
--reasoning-tokens integer Maximum number of tokens to use for reasoning. Anthropic Claude models only. None
--reasoning-summary choice (concise | detailed | auto) Provide summary of reasoning steps (defaults to no summary). Use ‘auto’ to access the most detailed summarizer available for the current model. OpenAI reasoning models only. None
--reasoning-history choice (none | all | last | auto) Include reasoning in chat message history sent to generate (defaults to “auto”, which uses the recommended default for each provider) None
--fail-on-error boolean Re-raise exceptions instead of capturing them in results False
--store text Resolve logs from a store. Use –store for the default store or –store PATH for a specific one. None
--filter text Log filter. Only process logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times (all must pass). None
--exclude text Log filter. Exclude logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times. None
--recursive / --no-recursive boolean Recurse into directories (default: true). No effect when –store is used. True
--dry-run boolean Preview changes without writing to disk. False
--display choice (full | rich | plain) Set the display mode (defaults to 'full'). full
--log-level choice (debug | trace | http | info | warning | error | critical | notset) Set the log level (defaults to 'warning'). warning
--help boolean Show this message and exit. False

flow step tag

Add or remove tags on eval logs.

Usage

flow step tag [OPTIONS] [PATH]...

Options

Name Type Description Default
--add text Tags to add. ()
--remove text Tags to remove. ()
--author text Provenance author. Defaults to git user. None
--reason text Reason for the edit. None
--store text Resolve logs from a store. Use –store for the default store or –store PATH for a specific one. None
--filter text Log filter. Only process logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times (all must pass). None
--exclude text Log filter. Exclude logs that pass. Accepts a registered name, file.py@name, or a name defined in _flow.py. Can be used multiple times. None
--recursive / --no-recursive boolean Recurse into directories (default: true). No effect when –store is used. True
--dry-run boolean Preview changes without writing to disk. False
--display choice (full | rich | plain) Set the display mode (defaults to 'full'). full
--log-level choice (debug | trace | http | info | warning | error | critical | notset) Set the log level (defaults to 'warning'). warning
--help boolean Show this message and exit. False