TAM program builder

TAM Observability Builder

A customer-facing observability builder for TAM-led Puppet Enterprise demos, live dashboard storytelling, screenshot-ready deck prep, and exportable starter packs. Discovery notes stay local. Fake data stays fake.

Open Grafana Demo
No server-side note storage. Discovery stays in-browser until exported locally at the end of the session.
Demo control roomAlways fake data
Default live pathGrafana demo

Keep a live dashboard one click away for discovery calls and demos.

Export modelive demo

Switch between live demo, screenshot deck, and customer export motions.

Dashboards queued3

Each dashboard can have its own refresh rate and fake data profile.

Fake data jobs3

Generation is staged and bundled at the end of the workflow.

Grafana
InfluxDB
Docker
GitHub
Next.js
1. Engagement mode

Choose the outcome for this session

2. Discovery

Guide the customer conversation

Nothing is stored automatically. Use Save Session at the end to download a local JSON handoff.

3. Dashboard builder

Select dashboards, fake data plans, and refresh cadence

Executive overview

PE At A Glance

Compresses platform posture into a first-screen demo that makes the rest of the story legible.

Persona: TAM sponsorLive: Lead with a board-ready signal before diving into service depth.Customer: Starter scorecard for cross-functional PE health reviews.
Total NodesSuccessful RunsFailure RateAverage Run Duration
Executive overview

Environment Health

Separates broad platform health from localized trouble so follow-up dashboards stay focused.

Persona: Operations leadLive: Show whether environments are equally healthy or one segment carries most of the operational drag.Customer: Useful for environment-by-environment health checkpoints.
Environment Success RateCorrective ChangesOrchestrator Queue DepthPE Service Health
Leadership

PE Leadership Scorecard

Lets a TAM connect observability to risk reduction and platform confidence without drowning in raw metrics.

Persona: Executive stakeholderLive: Translate PE telemetry into operational confidence, velocity, and change trust.Customer: Monthly review artifact for leadership or service owners.
Change Success RateNode CoverageCorrective DriftIncident Pressure
Platform admin

Infrastructure Health

Infrastructure saturation is where PE complaints often begin before they are recognized as platform issues.

Persona: Platform adminLive: Show if the bottleneck is compute, storage, or service saturation.Customer: Core operational dashboard for day-two ownership.
CPU SaturationMemory PressureDisk UtilizationPostgreSQL Connections
Operations / SRE

Incident Signals

Builds confidence that the demo is not just executive paint but operationally useful telemetry.

Persona: SRE or incident commanderLive: Show what the team would notice first during a live degradation.Customer: Incident room dashboard with fast refresh.
Run FailuresHTTP 5xx ProxyCompile SpikesQueue Saturation
Operations / SRE

Failure Analysis

Turns alert noise into actionable narrowing, which is usually where PE teams lose time.

Persona: Platform engineerLive: Move from “something is wrong” to “which nodes, services, or resources explain it.”Customer: A troubleshooting view for recurrent run failures and drift.
Top Failing NodesFailure Count by ResourceRun Duration OutliersCorrective Drift Hotspots
Advanced

Run Telemetry

This is where the demo proves the platform can go beyond health lights into workflow detail.

Persona: PE operatorLive: Show advanced run and event detail for customers who already want operational depth.Customer: Useful when the customer has enough data maturity to maintain advanced views.
Per Node Run DurationResource Evaluation TimeDetailed Event MetricsCross Node Run Comparison
Advanced

Patching Operations

Patch visibility is one of the easiest TAM wins because it turns routine work into measurable confidence.

Persona: Operations leadLive: Show whether patching is controlled, timely, and consistent across the estate.Customer: Useful for maintenance windows, compliance reviews, and patch-cycle retrospectives.
Patch Completion RatePatch Errors By NodeTime To PatchPatch Window Compliance
4. Fake data strategy

Stage synthetic data only when the dashboard mix is final

Healthy baseline

Stable service timings, clean runs, modest queue depth, and steady node compliance.

  • executive demos
  • first discovery session
  • customer-ready starter exports

Degraded incident

Higher failure counts, slower compile times, queue growth, and visible infrastructure strain.

  • incident workflows
  • operations storytelling
  • showing observability value quickly

Large-scale estate

Higher fleet sizes, wider environment spread, and larger backlog pressure without collapsing health.

  • enterprise sizing
  • capacity planning
  • architecture conversations

Generated runbook

Use this after discovery, not during it. All profiles remain synthetic.

PROFILE=healthy ./scripts/start-demo.sh healthy
PROFILE=degraded ./scripts/start-demo.sh degraded
PROFILE=degraded ./scripts/start-demo.sh degraded
5. Screenshot studio

Guide the tech demo capture flow

1

Finalize the dashboard mix and mark which ones should receive fake data.

2

Run the capture flow once the data profile is loaded and the refresh cadence is stable.

3

Export slide notes so each screenshot can drop directly into PowerPoint with operator guidance.

npm install
npx playwright install chromium
./scripts/capture-demo.sh
6. Metric catalog

Sort by metric, persona, or category before building a customer dashboard pack

MetricCategoryPersonaSourceWhy it matters
Average Run DurationPerformancePE operator, Platform adminpuppet_reports.average_run_duration

Longer runs usually precede trust erosion and missed windows.

Change Success RateChange managementExecutive stakeholder, Operations leadpuppet_reports.succeeded_count / total_runs

Converts technical execution into a trust metric for change processes.

Compile SpikesPerformancePlatform engineer, PE operatorpuppetserver.compile_time_p95

Explains why runs feel worse even before broad failure rates climb.

Corrective ChangesDriftOperations lead, Executive stakeholderpuppet_reports.corrective_count

Measures how much of the platform is reacting to drift instead of planned change.

Corrective Drift HotspotsDriftOperations lead, Platform engineerpuppet_reports.corrective_count by node or environment

Reveals where declared state is least trustworthy.

CPU SaturationInfrastructurePlatform admin, PE operatorsystem_cpu.usage

CPU pressure often explains compile or orchestrator slowdowns.

Cross Node Run ComparisonAdvanced run telemetryPE operatorpuppet_data_connector top run_duration by certname

Makes cross-node tuning conversations concrete.

Detailed Event MetricsAdvanced run telemetryPE operator, Operations leadpuppet_data_connector.event_count/change_count/failure_count

Turns run detail into a tunable workload story.

Disk UtilizationInfrastructurePlatform adminsystem_disk.percent_used

Storage bottlenecks affect reports, DB performance, and stability.

Failure Count by ResourceTriagePlatform engineerpuppet_events.failure_count by resource_type

Shows whether failures cluster around one class of managed object.

Failure RateRun healthOperations lead, SRE or incident commanderpuppet_reports.failed_count / total_runs

Highlights customer-facing risk faster than raw fail counts alone.

HTTP 5xx ProxyServicesSRE or incident commanderservice failure proxy from status and error counters

Creates a service-facing signal customers immediately understand.

Incident PressureIncidentSRE or incident commander, Operations leadcomposite of failures, queue depth, and 5xx proxies

Gives an at-a-glance signal for whether operators should stay in incident mode.

Memory PressureInfrastructurePlatform admin, PE operatorsystem_memory.percent_used

Memory pressure can hide behind intermittent failures and restarts.

Node CoverageFleetExecutive stakeholder, TAM sponsorpuppet_inventory.node_count by environment

Shows whether the dashboard scope matches the actual managed estate.

Orchestrator Queue DepthWorkflowOperations lead, PE operatororchestrator.deploy_queue_length

Explains whether slow outcomes come from demand piling up.

Patch Completion RatePatchingOperations lead, Executive stakeholderpuppet_data_connector.patch_job_status

Measures whether patch jobs reach the finish line reliably.

Patch Errors By NodePatchingOperations lead, Platform engineerpuppet_data_connector.patch_node_error_count

Prioritizes remediation for patching friction.

Patch Window CompliancePatchingExecutive stakeholder, Operations leadderived patch_job_status over time

Connects patching to governance and change trust.

PE Service HealthServicesPlatform admin, Operations leadorchestrator|puppetdb|puppetserver service status

Separates service degradation from workload-driven slowdowns.

Per Node Run DurationAdvanced run telemetryPE operatorpuppet_data_connector.run_duration

Useful for node-level comparison in larger estates.

PostgreSQL ConnectionsDatabasePlatform admin, PE operatorpostgresql.connections

Connection stress exposes database saturation early.

Queue SaturationWorkflowSRE or incident commander, Operations leadorchestrator.deploy_queue_length and throughput

Shows if operational demand is outrunning system capacity.

Resource Evaluation TimeAdvanced run telemetryPE operatorpuppet_data_connector.resource_evaluation_time

Highlights catalog evaluation cost beyond total run time.

Run Duration OutliersPerformancePlatform engineer, PE operatorpuppet_reports.run_duration top values

Outliers point to issues averages flatten away.

Run FailuresIncidentSRE or incident commander, Platform engineerpuppet_reports.failed_count

Fastest way to show that policy execution is no longer normal.

Successful RunsRun healthOperations lead, Executive stakeholderpuppet_reports.succeeded_count

Shows whether configuration management is completing as expected.

Time To PatchPatchingOperations leadpuppet_data_connector.patch_job_duration

Duration matters as much as success during maintenance windows.

Top Failing NodesTriagePlatform engineer, PE operatorpuppet_reports.failed_count by certname

Prioritizes follow-up by highest operational pain.

Total NodesFleetExecutive stakeholder, TAM sponsorpuppet_inventory.node_count

Establishes scale before discussing health or efficiency.

7. Agentic usage

Best practices when an engineering agent is helping build the demo

Keep customer data out

Agents should only work with the provided fake data profiles, dashboard JSON, and exported local notes. Never paste real customer telemetry into the builder.

Separate discovery from execution

Use the discovery prompts first, finalize the dashboard plan second, and only then ask the agent to run capture or export commands.

Prefer explicit outputs

Have the agent export slide notes, fake data commands, and customer handoff artifacts instead of relying on implicit state.

Review before sharing

Treat the generated dashboard pack as a draft. Review metric naming, refresh cadence, and persona mapping before customer handoff.

8. Session closeout

Export locally

Local-only guarantee

No automatic saves. No remote persistence. No customer data retention. Notes export only when the operator chooses to download them.

Refresh guidance
15 seconds

Use for live command demos or incident walk-throughs.

30 seconds

Use for TAM-led discovery when change is visible but calmer pacing helps.

1 minute

Use for leadership scorecards and broader platform reviews.

5 minutes

Use for static screenshots and exported customer starter packs.

Flow

How the workflow moves from discovery to export

flowchart LR
    A[Discovery prompts] --> B[Select personas and dashboards]
    B --> C[Choose fake data profiles]
    C --> D[Set refresh cadence per dashboard]
    D --> E[Open live Grafana demo]
    D --> F[Run screenshot capture flow]
    D --> G[Export customer dashboard pack]
    A --> H[Local-only session export]
    F --> I[PowerPoint slide notes]
    G --> J[Customer installation handoff]