Skip to main content

Introduction

Agent Testing lets you run automated test scenarios against your pathways and personas to catch issues before they reach customers. A simulated test agent calls your pathway, follows a scripted behavior, and evaluates predefined assertions to measure whether your agent handled the conversation correctly.

Scenario-Based Testing

Define test scenarios with tester personas that simulate real callers — angry customers, voicemail systems, gatekeepers, and more. Customers can also import pre-built scenario templates.

Assertion Engine

10 assertion types including LLM judges, variable extraction checks, node traversal validation, and regex matching

Batch & Simulation Runs

Run multiple scenarios at once, or run the same scenario N times to detect flaky behavior

Fix Thoroughly

Iterative auto-fix loop that runs tests, analyzes failures, applies fixes, and retests until everything passes

Core Concepts

Scenarios

A scenario defines a test case for your pathway. Each scenario contains:
  • Tester Persona — A prompt that tells the simulated caller how to behave (e.g., “You are an angry customer who was charged twice”)
  • Assertions — Rules that evaluate whether the agent handled the conversation correctly
  • Configuration — Max turns, starting node, request data overrides, and more
Scenarios are scoped to a pathway (or persona) and persist across versions, so the same tests run against every version you promote.

Assertions

Assertions are the pass/fail criteria for a test run. Each scenario can have multiple assertions, each with a type, weight, and required flag.
TypeWhat it checks
LLM_JUDGECustom LLM prompt evaluates the conversation (boolean, score, or categorical)
BLAND_TONEBuilt-in naturalness scoring — empathy, conciseness, flow, back-channeling
VARIABLE_EXTRACTEDWhether a specific variable was collected and optionally matches a value
NODE_REACHEDWhether the agent visited a specific pathway node
NODES_VISITEDWhether the agent visited a set of nodes (all or any)
WEBHOOK_TRIGGEREDWhether a webhook was called during the conversation
REGEX_MATCHRegex pattern match against the transcript or extracted variables
STRING_CHECKSimple string match (equals, contains, case-insensitive) against transcript or variables
CUSTOM_LLMCustom LLM evaluation with your own system prompt and model
TRAVERSAL_MATCHCompares the node traversal path against a reference path

Test Runs

When you execute a scenario, a test run is created. The system:
  1. Creates a simulated call
  2. The tester persona converses with your pathway up to a predefined maximum turn depth
  3. Captures the full chat history, extracted variables, and nodes visited
  4. Evaluates all assertions against the conversation
  5. Computes an overall score (weighted average of assertion scores)
Run statuses: PENDINGRUNNINGPASSED | FAILED | ERROR | CANCELLED

Batches

A batch runs multiple scenarios together. This is useful for running your full test suite against a pathway version before promoting it to production. Batches can also be triggered automatically during pathway promotion — scenarios marked is_required_for_promotion act as a gate that must pass before the version goes live.

Scenario Categories

Bland provides out-of-box templates for common testing scenarios. You can clone these to your pathway and customize them.
Tests how your agent handles reaching voicemail — leaving a coherent message, handling a full mailbox, etc.
Simulates Google Call Screen or a suspicious gatekeeper. Tests whether your agent identifies itself clearly to get through.
Progressively escalating frustration. Tests de-escalation skills — empathy, ownership, and professionalism under pressure.
Abusive language and personal attacks. Tests whether your agent maintains professional boundaries and escalates appropriately.
Rambling, off-topic caller who isn’t sure what they need. Tests patience, clarification questions, and gentle redirection.
Cooperative callers following the expected flow — appointment scheduling, information requests, etc. Validates your core conversation logic.
Unusual situations that test the boundaries of your pathway’s handling.
Your own custom scenarios tailored to your specific use case.

Getting Started

1

Open the Scenarios Panel

From the pathway editor, click the Scenarios button in the top toolbar to open the scenarios panel. This shows your existing scenarios organized by Gates (required for promotion) and Tests (standard scenarios), along with recent run history.Pathway editor with the Scenarios panel open showing gates, tests, and recent runs
2

Create a Scenario

Click + New to open the scenario creation form. You can start from one of the 9 pre-built scenario templates, generate from a call log, or build one from scratch.Fill in the scenario name, describe the test caller persona (how the simulated caller should behave), toggle Bland Tone scoring, and add assertions to define your pass/fail criteria.Full pathway editor view with the create scenario form openEach assertion has a type (LLM Judge, Variable Check, Node Reached, Regex, etc.), an optional name, and a Required toggle. Click Test & Save to run the scenario immediately, or Save to save it for later.
3

Run and Review Results

Run a single scenario with the Run button, or click Run All to execute your entire test suite. Results appear on the pathway splash page showing pass/fail status, score percentages, assertion breakdowns, tone scores, and AI-generated reasoning.Full pathway page showing test results with passed scenarios, scores, and gate status
4

Gate Pathway Promotion

Toggle the Production gate switch on critical scenarios. When enabled, these scenarios must pass before a pathway version can be promoted to production. The gate count is shown in the summary bar (e.g., “1/1 gates”).Expanded scenario showing assertion results, tone score, and production gate toggle

Simulation Sets

Simulation sets run the same scenario multiple times to detect flaky behavior — cases where your pathway passes sometimes but fails other times. The results include per-scenario statistics:
  • Pass rate across all runs
  • Score distribution (mean, median, stddev, min, max)
  • Flakiness detection — scenarios flagged as unreliable
  • Common failure modes — recurring issues across runs
  • Confidence level (low, medium, high) based on sample size

Tornado Mode

Tornado mode is an iterative auto-fix loop for your pathway. It:
  1. Runs all specified test scenarios
  2. Analyzes any failures using the Norm analyzer
  3. Generates a fix plan (prompt changes, node config updates, flow changes)
  4. Applies the fixes to a forked pathway version
  5. Re-runs the tests
  6. Repeats until all tests pass or max iterations are reached
Tornado statuses: RUNNINGCOMPLETED_ALL_PASSED | COMPLETED_PARTIAL | TIMEOUT | STUCK | CANCELLED | ERROR
Only one tornado session can run per pathway at a time. Starting a new session while one is active returns a 409 conflict.

Analytics

Agent testing provides two levels of analytics for your pathway:

Basic Analytics

Pass rate, failure rate, average score, and per-run details over a configurable time window (default 30 days, max 365).

Enhanced Analytics

Advanced metrics including:
  • Health Score — Overall pathway testing health
  • Reliability Score — Consistency of test results
  • Trend Analysis — Weekly direction (improving, declining, stable) with daily pass rates
  • Node Failure Heatmap — Which pathway nodes have the highest failure rates
  • Weakest Link — The single node most responsible for test failures
  • Sankey Flow Data — Visual flow from scenarios through nodes to outcomes (pass/fail)

Norm Analysis

When a test run fails, you can trigger Norm analysis to get AI-powered suggestions for fixing your pathway. The analyzer examines the conversation, identifies the root cause, and suggests specific changes:
  • prompt_change — Modify a node’s prompt text
  • node_config — Change node configuration settings
  • flow_change — Restructure pathway routing
  • tone_improvement — Adjust conversational style
Each suggestion includes a target node, description, suggested change text, and confidence score.

API Reference

For full endpoint details, see the Scenarios API Reference.
EndpointDescription
POST /v1/agent-testing/scenariosCreate a test scenario
GET /v1/agent-testing/scenariosList scenarios
GET /v1/agent-testing/scenarios/:idGet scenario details
PUT /v1/agent-testing/scenarios/:idUpdate a scenario
DELETE /v1/agent-testing/scenarios/:idDelete a scenario
GET /v1/agent-testing/templatesList out-of-box templates
POST /v1/agent-testing/templates/:id/cloneClone a template
POST /v1/agent-testing/scenarios/generate-from-callGenerate a scenario from a call transcript
POST /v1/agent-testing/scenarios/:id/runRun a single scenario
POST /v1/agent-testing/batch-runRun multiple scenarios
GET /v1/agent-testing/runsList test runs
GET /v1/agent-testing/runs/:idGet run details
GET /v1/agent-testing/batches/:idGet batch details
POST /v1/agent-testing/runs/:id/analyzeAnalyze a failed run
GET /v1/agent-testing/analytics/:pathwayIdBasic analytics
GET /v1/agent-testing/analytics/:pathwayId/enhancedEnhanced analytics
POST /v1/agent-testing/simulation-setsCreate a simulation set
GET /v1/agent-testing/simulation-sets/:idGet simulation set results
POST /v1/agent-testing/tornado/startStart tornado mode
GET /v1/agent-testing/tornado/activeGet active tornado session
GET /v1/agent-testing/tornado/:id/statusGet tornado progress
POST /v1/agent-testing/tornado/:id/cancelCancel tornado session