Skip to content

Demo Lab

Harrier EMR Demo Lab is the companion repository for controlled AWS validation. It creates disposable EMR incidents so Harrier can be tested against real Spark failure evidence instead of only synthetic fixtures.

The production MCP server lives in harrier-emr-mcp. The demo lab owns the AWS infrastructure, Spark jobs, scenario runners, expected findings, validation harness, alarms, cleanup scripts, and cost-control docs.

What It Covers

Runtime Coverage Example Evidence
EMR on EC2 Broad scenario coverage EMR steps, YARN application IDs, S3 logs, CloudWatch metrics
EMR Serverless Focused Spark failure coverage job run metadata, S3 logs, CloudWatch logs and metrics
EMR on EKS Focused Spark and Kubernetes coverage EMR Containers job runs, pod status, S3 logs, CloudWatch logs
MWAA local runner Orchestration demo scenario DAGs packaged for ECS Fargate

Scenario Catalog

Scenario Runtime Coverage Expected Finding
happy_path EC2, Serverless, EKS success
executor_oom EC2, Serverless, EKS EXECUTOR_OOM
driver_oom EC2 DRIVER_OOM
missing_dependency EC2, Serverless DEPENDENCY_MISSING
s3_access_denied EC2, EKS S3_ACCESS_DENIED
s3_path_missing EC2, Serverless S3_PATH_MISSING
bad_input_data EC2, Serverless BAD_INPUT_DATA
image_pull_failure EKS EKS_IMAGE_PULL_FAILURE
pod_pending_resource_pressure EKS EKS_POD_PENDING

The demo lab also includes EC2-focused data, SQL, storage, and Livy scenarios such as data_skew, hdfs_full, db_connection_failure, db_lock_timeout, db_large_join_spill, db_bad_sql_plan, and livy_session_failure.

Validation Flow

flowchart LR
  Deploy["Deploy demo infra"] --> Scenario["Run scenario"]
  Scenario --> Context["Export Harrier context"]
  Context --> MCP["Call Harrier MCP"]
  MCP --> Compare["Compare expected finding"]
  Compare --> Report["Write validation report"]
  Report --> Cleanup["Cleanup or destroy"]

Validation reports are written under .harrier-demo/validation/ in the demo repository and should not be committed.

Safety First

The demo lab creates real AWS resources and real AWS cost. Use a sandbox account, review the cost and cleanup docs before deployment, and destroy resources when validation is complete.

Repository

The demo lab repository is:

https://github.com/the-platform-layer/harrier-emr-demo-lab

If you have access to the repository, start with its README, cost controls, scenario catalog, and cleanup guide before running live workloads.