Skip to main content
Version: 1.0.0

Orphans & cycles

Two graph-shape problems show up with depressing regularity in real clusters: orphans (resources nothing depends on, that shouldn't be roots) and cycles (dependency loops that shouldn't exist at all). KubeAtlas surfaces both as first-class endpoints so they're discoverable from a dashboard or a CI gate.

Orphans (F-112 part 1)

An orphan is a resource that:

  • has zero incoming edges, AND
  • is not a top-level kind (a kind that conventionally has zero incoming edges by design — Namespace, Node, Deployment, etc.).

Plus a special case for Pods: a Pod with no ownerReferences is flagged as standalone_pod, distinct from "orphan", because many users kubectl run ad-hoc Pods on purpose. The reason field lets dashboards render different copy.

API

GET /api/v1/orphans
GET /api/v1/orphans?namespace=demo
{
"reports": [
{
"resource": { "kind": "ReplicaSet", "namespace": "demo", "name": "ghost-rs" },
"reason": "orphan"
},
{
"resource": { "kind": "Pod", "namespace": "demo", "name": "lonely" },
"reason": "standalone_pod"
}
],
"count": 2
}

Top-level whitelist

These kinds never appear in the orphans list, no matter what their incoming-edge count is:

  • Cluster-scoped roots: Namespace, Node, PersistentVolume, StorageClass, ClusterRole, ClusterRoleBinding, CustomResourceDefinition.
  • Namespaced kinds users / GitOps systems author directly: Deployment, StatefulSet, DaemonSet, Service, Ingress, Gateway, HTTPRoute, ConfigMap, Secret, ServiceAccount, Role, RoleBinding, Job, CronJob, PersistentVolumeClaim, NetworkPolicy.

Anything else with zero incoming edges is suspect — typical catches:

  • A ReplicaSet whose Deployment was deleted with --cascade=orphan.
  • A Job template (a CronJob's child Job that lost its CronJob).
  • A custom resource whose owner CRD was uninstalled.

What orphans does not tell you

  • It doesn't say why the upstream went away. The graph encodes the current state, not the history; pair with kubectl get events or your audit log if you need the cause.
  • It doesn't auto-clean. KubeAtlas is read-only by design. Removing the resource is a kubectl delete you make consciously after seeing the report.

Cycles (F-112 part 2)

A cycle is a strongly connected component (SCC) of two or more resources. Trivial single-vertex SCCs (resources that point at themselves) are not reported — they're either extractor mis-fires or legitimate self-references and would only spam dashboards.

In a healthy cluster the cycles list is empty. Anything non-empty is an investigate-immediately signal: K8s won't allow OwnerReference cycles by construction, so a non-empty cycle list means an extractor is over-reaching, a custom resource has a genuine config error, or someone has been hand-editing references.

API

GET /api/v1/cycles
{
"cycles": [
{
"members": [
{ "kind": "ConfigMap", "namespace": "demo", "name": "a" },
{ "kind": "ConfigMap", "namespace": "demo", "name": "b" }
]
}
],
"count": 1
}

Members within a cycle are sorted by ID for diff stability; multiple disjoint cycles each get their own object.

Algorithm

Tarjan's SCC algorithm — O(V + E). The playbook prescribes this specifically over a hand-rolled DFS + visited set: the textbook implementation is correctness-tested and the perf budget on 5K-vertex / 5K-edge graphs is ~80ms with the race detector enabled, well under the 200ms target.

Dangling edges (target node not in the snapshot) are dropped silently before Tarjan runs so the algorithm sees a closed vertex set.

Folded into resource detail (/api/v1/...)

The v1 surface carries isOrphan and inCycle booleans on the resource-detail bundle so the UI can render badges per row without a follow-up round-trip. See Blast radius.

CI gate

Two sample uses worth knowing about:

  • A scheduled job that hits /api/v1/cycles and pages oncall when count > 0. False positives should never happen — if one fires, the cluster has a real problem.
  • A pre-prod CI step that hits /api/v1/orphans?namespace=... for the namespace under test, and fails the build when the report is non-empty. Catches "PR removed the Deployment but forgot the Service" classes of mistakes early.

The integration test in test/verify/phase2.sh (Part 3 / 4) exercises both endpoints on a fixture cluster — the orphan path applies a ghost-rs ReplicaSet and confirms it appears; the cycle path confirms the endpoint stays empty on a healthy fixture.

What if the orphans list is wrong on my cluster

The most common cause: you have a CRD whose owner field is not populated as a standard ownerReferences link. The OWNS extractor only looks at metadata.ownerReferences; if your CRD encodes its parent in spec.ownerName or similar, write a Rego rule that emits the OWNS edge — see Rego rules.

Once the rule is loaded, the orphans report will start treating those resources as having an upstream and they'll fall out of the list automatically.