---
title: "Investigate production incidents across Kubernetes and cloud signals with HolmesGPT"
description: "Use HolmesGPT when an on-call agent needs one investigation loop that pulls alerts, logs, metrics, and infrastructure context from multiple systems and returns a root-cause path instead of forcing a human to hop across separate observability products."
verification: "security_reviewed"
source: "https://github.com/HolmesGPT/holmesgpt"
author: "HolmesGPT"
publisher_type: "organization"
category:
  - "Runbooks & Diagnostics"
framework:
  - "Custom Agents"
tool_ecosystem:
  github_repo: "HolmesGPT/holmesgpt"
  github_stars: 2265
---

# Investigate production incidents across Kubernetes and cloud signals with HolmesGPT

Use HolmesGPT when an on-call agent needs one investigation loop that pulls alerts, logs, metrics, and infrastructure context from multiple systems and returns a root-cause path instead of forcing a human to hop across separate observability products.

## Prerequisites

HolmesGPT CLI or operator deployment, one supported LLM provider, and connected observability/toolset integrations

## Installation

Choose whichever fits your setup:

1. Copy this skill folder into your local skills directory.
2. Clone the repo and symlink or copy the skill into your agent workspace.
3. Add the repo as a git submodule if you manage shared skills centrally.
4. Install it through your internal provisioning or packaging workflow.
5. Download the folder directly from GitHub and place it in your skills collection.

Install command or upstream instructions:

```
Follow the HolmesGPT installation docs to deploy the CLI or Kubernetes operator, configure an LLM provider, and connect the relevant observability toolsets before running incident investigations.
```

## Documentation

- https://holmesgpt.dev/

## Source

- [Agent Skill Exchange](https://agentskillexchange.com/skills/investigate-production-incidents-across-kubernetes-and-cloud-signals-with-holmesgpt/)