MCP Tool Contracts
Harrier exposes four production MCP tools:
| Tool | Purpose |
|---|---|
harrier_start_emr_investigation |
Start a runtime-aware EMR/Spark investigation. |
harrier_get_investigation_report |
Fetch a stored investigation report. |
harrier_get_evidence |
Fetch filtered evidence for an investigation. |
harrier_prepare_pr |
Prepare a dry-run PR preview or create a guarded PR. |
Demo-lab scenario commands are not MCP tools.
Server
Default local endpoint:
http://127.0.0.1:8000/mcp
Default transport:
streamable-http
Configurable with:
HARRIER_MCP_TRANSPORT
HARRIER_MCP_HOST
HARRIER_MCP_PORT
HARRIER_MCP_PATH
Versioning Policy
Harrier treats these fields as stable public contract once releases begin:
- tool names
- required request fields
- response top-level fields
- diagnosis status values
- finding category names
- recommendation type names
New optional fields may be added in minor releases. Required-field changes, removed fields, or renamed categories require a major release.
harrier_start_emr_investigation
Starts an investigation, stores a report, and returns the current best diagnosis.
Required Fields
| Field | Description |
|---|---|
account_id |
12-digit AWS account ID. |
region |
AWS region. |
| runtime target | Runtime-specific target IDs. |
runtime defaults to emr_ec2.
Runtime Targets
EMR On EC2
Legacy flat request:
{
"account_id": "111122223333",
"region": "ap-southeast-2",
"runtime": "emr_ec2",
"cluster_id": "j-1234567890ABC",
"step_id": "s-1234567890ABC",
"application_id": "application_1748300000000_0001"
}
Preferred target request:
{
"account_id": "111122223333",
"region": "ap-southeast-2",
"runtime": "emr_ec2",
"target": {
"cluster_id": "j-1234567890ABC",
"step_id": "s-1234567890ABC",
"yarn_application_id": "application_1748300000000_0001"
},
"deploy_mode": "cluster",
"job_state": "failed",
"time_window": {
"start": "2026-06-01T05:00:00Z",
"end": "2026-06-01T05:30:00Z"
}
}
EMR Serverless
{
"account_id": "111122223333",
"region": "ap-southeast-2",
"runtime": "emr_serverless",
"target": {
"serverless_application_id": "00f1abcd2efg3hij",
"job_run_id": "00f1abcd2efg3hij-000001",
"attempt": 1
},
"job_state": "failed"
}
EMR On EKS
{
"account_id": "111122223333",
"region": "ap-southeast-2",
"runtime": "emr_eks",
"target": {
"virtual_cluster_id": "vc-1234567890abcdef0",
"job_run_id": "job-run-123",
"eks_cluster_name": "analytics-dev",
"namespace": "emr-jobs"
},
"job_state": "failed"
}
Optional Fields
| Field | Values | Notes |
|---|---|---|
deploy_mode |
client, cluster, unknown |
EC2 Spark driver log layout hint. |
job_state |
running, completed, failed, unknown |
Used for failed and running investigations. |
time_window |
ISO timestamps | Helps bound log and metric reads. |
repo |
provider, owner, name, branch | Enables repository-aware recommendations. |
database_connection |
string alias | Enables read-only DB diagnostics where configured. |
diagnostic_signals |
normalized metrics | Used by tests, demos, and running-job diagnosis. |
create_pr |
boolean | Deprecated in favor of harrier_prepare_pr; keep false. |
Response
{
"investigation_id": "inv-local-abc123",
"status": "completed",
"root_cause": {
"category": "S3_ACCESS_DENIED",
"summary": "S3 permissions prevent the job from reading or writing an object path.",
"confidence": 0.95,
"affected_component": "s3"
},
"diagnosis_report": {},
"evidence_summary": [],
"recommendations": [],
"next_tools": [
"harrier_get_investigation_report",
"harrier_get_evidence",
"harrier_prepare_pr"
],
"request": {},
"warnings": []
}
Diagnosis Report Contract
The diagnosis_report is structured data plus a generated Markdown rendering.
It is always an initial triage report, not final RCA.
Key fields:
| Field | Purpose |
|---|---|
phase |
Current report phase. Currently Initial Triage. |
notice |
Human disclaimer that this is not final RCA. |
final_root_cause_analysis |
Always false for initial triage. |
likely_area |
Infrastructure, Data, Spark Runtime, Kubernetes, Observability, or Configuration. |
most_suspicious_signal |
Finding category that should drive the next investigation path. |
confidence |
Human confidence label. |
area_summaries |
Top-level board grouped by area. |
visual_check_tree |
Runtime-specific pass/fail/inconclusive tree. |
evidence_cards |
Human evidence cards. |
log_excerpts |
Bounded fenced log snippets. |
inconclusive_checks |
Checks that need more evidence. |
next_steps |
Suggested detailed investigation path. |
human_report_markdown |
Operator-facing Markdown rendering. |
Status values:
PASS
ISSUE
WARN
UNKNOWN
NOT_CHECKED
harrier_get_investigation_report
Fetches the full stored report.
Request
{
"investigation_id": "inv-local-abc123",
"format": "json"
}
format may be json or markdown. The response includes both structured
fields and human_report_markdown when available.
Response
{
"investigation_id": "inv-local-abc123",
"executive_summary": "Harrier identified S3_ACCESS_DENIED with 95% confidence.",
"technical_summary": "Primary finding: S3 permissions prevent the job from reading or writing an object path.",
"diagnosis_report": {},
"human_report_markdown": "# Harrier Initial Diagnosis Report\n...",
"timeline": [],
"evidence": [],
"findings": [],
"recommendations": [],
"validation_steps": [],
"rollback_plan": []
}
harrier_get_evidence
Returns evidence for an investigation, optionally filtered by source type and severity.
Request
{
"investigation_id": "inv-local-abc123",
"evidence_type": "all",
"severity": "high"
}
Evidence types:
driver_logs
executor_logs
controller_logs
metrics
spark
iam
code
db
all
Severity values:
low
medium
high
critical
Response
{
"investigation_id": "inv-local-abc123",
"evidence_type": "all",
"evidence": [
{
"id": "ev-s3-log-001",
"source": "driver_logs",
"source_uri": "s3://example/logs/stdout.gz",
"timestamp": null,
"severity": "high",
"signal": "Log contains an S3 or AWS access denied signal.",
"excerpt": "AccessDenied while reading s3://example/input.csv",
"redacted": true,
"linked_finding": "finding-001"
}
]
}
harrier_prepare_pr
Builds a dry-run PR preview by default. Real PR creation is blocked unless all write gates pass.
Request
{
"investigation_id": "inv-local-abc123",
"repo": {
"provider": "github",
"owner": "example",
"name": "analytics-platform",
"base_branch": "main"
},
"recommendation_ids": [],
"dry_run": true,
"allow_pr_creation": false
}
Response
{
"status": "prepared",
"branch": "harrier/inv-local-abc123",
"files_to_change": [
"docs/harrier/inv-local-abc123.md"
],
"diff_summary": [
{
"file": "docs/harrier/inv-local-abc123.md",
"change": "Add Harrier investigation summary, evidence, findings, and recommendations."
}
],
"pr_title": "chore: add Harrier investigation for S3_ACCESS_DENIED",
"pr_body": "## Harrier Investigation\n...",
"pr_url": null,
"reason": null
}
Write Gates
Real PR creation requires all of:
- request has
dry_run=false - request has
allow_pr_creation=true - server has
HARRIER_ALLOW_PR_CREATION=true - repository is in
HARRIER_PR_REPO_ALLOWLIST - GitHub token is configured
Harrier never merges PRs.
Recoverable Warnings
Collectors should return warnings instead of failing the whole investigation for:
- AWS
AccessDenied - missing or expired clusters/jobs
- logs not yet available
- missing metrics
- unavailable Kubernetes config
- Kubernetes RBAC denial
AWS DevOps Agent should show warnings and inconclusive_checks when evidence
is incomplete.