---
name: "Evaluate long-horizon agents against WildClawBench"
slug: "evaluate-long-horizon-agents-against-wildclawbench"
description: "Use WildClawBench to benchmark agents on hard end-to-end OpenClaw tasks covering tool orchestration, multimodal work, coding, safety, and long-horizon planning."
github_stars: 359
verification: "listed"
source: "https://github.com/InternLM/WildClawBench"
author: "InternLM"
publisher_type: "research_open_source"
category: "Templates & Workflows"
framework: "OpenClaw"
tool_ecosystem:
  github_repo: "InternLM/WildClawBench"
  github_stars: 359
---

# Evaluate long-horizon agents against WildClawBench

Use WildClawBench to benchmark agents on hard end-to-end OpenClaw tasks covering tool orchestration, multimodal work, coding, safety, and long-horizon planning.

## Prerequisites

WildClawBench assets; OpenClaw environment; target agent/model under test

## Installation

No source-backed install or usage instructions could be extracted automatically. Review the upstream project before running this skill in a sensitive workflow.

- Source: https://github.com/InternLM/WildClawBench

## Documentation

- https://internlm.github.io/WildClawBench/

## Source

- [Agent Skill Exchange](https://agentskillexchange.com/skills/evaluate-long-horizon-agents-against-wildclawbench/)