---
title: "Benchmark virtual agents with scripted multi-turn conversations using Agent Evaluation"
description: "Run concurrent scripted conversations against a target agent to measure whether it stays on task, responds correctly, and holds up in repeatable test cases."
verification: "listed"
source: "https://github.com/awslabs/agent-evaluation"
author: "AWS Labs"
publisher_type: "open_source_project"
category:
  - "Runbooks & Diagnostics"
framework:
  - "Custom Agents"
tool_ecosystem:
  github_repo: "awslabs/agent-evaluation"
  github_stars: 358
---

# Benchmark virtual agents with scripted multi-turn conversations using Agent Evaluation

Run concurrent scripted conversations against a target agent to measure whether it stays on task, responds correctly, and holds up in repeatable test cases.

## Prerequisites

Python environment, target agent endpoint or integration, optional AWS services such as Bedrock or SageMaker

## Installation

Choose whichever fits your setup:

1. Copy this skill folder into your local skills directory.
2. Clone the repo and symlink or copy the skill into your agent workspace.
3. Add the repo as a git submodule if you manage shared skills centrally.
4. Install it through your internal provisioning or packaging workflow.
5. Download the folder directly from GitHub and place it in your skills collection.

Install command or upstream instructions:

```
Clone the repository and follow the upstream documentation to configure an evaluator agent, define scripted test cases, and run benchmarks against your target agent.
```

## Documentation

- https://awslabs.github.io/agent-evaluation/

## Source

- [Agent Skill Exchange](https://agentskillexchange.com/skills/benchmark-virtual-agents-with-scripted-multi-turn-conversations-using-agent-evaluation/)
