HealthInfo Reporting

Purpose

HealthInfo provides structured, human-readable diagnostic information that explains why a component is in its current HealthState (OK, DEGRADED, FAILED, or UNKNOWN).

While HealthState gives you the high-level status, HealthInfo tells you the specific reason(s) — especially useful when troubleshooting failures or degraded behaviour.

Key characteristics:

  • Only populated (non-empty) when HealthState is DEGRADED, FAILED, or UNKNOWN

  • Empty ([]) when everything is OK

  • Updated in sync with HealthState changes

  • Published as an on-change event

Reporting Format

HealthInfo is a JSON object (dictionary) where:

  • Keys = Tango device names (leaf nodes)

  • Values = List of failure/diagnostic messages (strings) indicating the problem

Example — when problems exist:

{
    "low-tmc/subarray-leaf-node-csp/01": [
        "CSP Subarray Health State: FAILED",
        "Delay Model Exception."
    ],
    "low-tmc/subarray-leaf-node-sdp/01": [
        "Liveliness check failed for SDP"
    ],
    "low-tmc/subarray-leaf-node-mccs/01": [
        "MCCS Subarray Health State: UNKNOWN"
    ]
}

Example — when no issues:

[]

What You Will See as an Operator

  • Subarray level — HealthInfo shows aggregated problems from leaf nodes (CSP, SDP, MCCS) and any TMC-internal issues detected (e.g. liveliness check failure).

  • Leaf node level — More detailed reasons (available by reading HealthInfo directly from the relevant leaf node device).

  • Clear mapping of which device/subsystem is affected and why.

Use HealthInfo to:

  • Quickly identify which subsystem(s) caused a FAILED or DEGRADED HealthState

  • Understand whether the issue is external (subsystem) or internal (TMC-detected)

  • Guide deeper investigation (e.g. go to the failing subsystem’s own HealthInfo or logs)

For diagrams and more detailed system context, see:

HealthInfo Reporting Mechanism Diagram