Checks
Controller Version
0.14.1
Deployment Method
Helm
To Reproduce
- Deploy ARC v0.14.1 via Helm (
gha-runner-scale-set-controller chart)
- Observe listener pods in the
arc-systems namespace across one or more AutoscalingRunnerSet resources
- Within ~30 minutes, listener pods begin terminating with a non-zero exit code
- The ARC controller detects the termination and recreates the pod, which crashes again — producing a continuous crash-loop
Describe the bug
After upgrading ARC from v0.14.0 to v0.14.1, listener pods across multiple runner scale sets enter a continuous crash-loop. The controller repeatedly logs:
Listener pod is terminated {"reason": "Error", "message": ""}
The listener exits with a non-zero status code when it receives an EOF response from the GitHub Actions broker service (broker.actions.githubusercontent.com). This was introduced by the scaleset library upgrade from v0.2.0 → v0.3.0, specifically the "Restore job acquisition flow" change. In v0.3.0, an EOF from the broker is treated as a fatal error instead of being handled gracefully with a retry.
The listener logs confirm scaleset@v0.3.0 is in use:
time=... level=INFO source=github.com/actions/scaleset@v0.3.0/listener/listener.go:179 msg="Getting next message" component=listener lastMessageID=<id>
[DEBUG] GET https://broker.actions.githubusercontent.com/scalesets/message?lastMessageId=<id>
After the listener exits, the ARC controller recreates the pod, which crashes again on the next EOF — resulting in an endless restart loop. From our controller logs, we observed 4 listener pods across 2 production clusters accumulating 36 error terminations within a ~40-minute window.
Note: scaleset v0.4.0 does not fix this. It addresses a different (message ordering) issue. The EOF-on-exit regression is still present in v0.4.0.
Describe the expected behavior
The listener should treat EOF responses from the GitHub broker as transient/recoverable conditions and retry — as it did in scaleset v0.2.0 (ARC v0.14.0). It should not exit with a non-zero status code on EOF.
Additional Context
# gha-runner-scale-set-controller values
image:
tag: "0.14.1"
# gha-runner-scale-set values (representative — same pattern for all affected scale sets)
githubConfigUrl: "https://github.com/<org>"
minRunners: 0
maxRunners: <N>
Kubernetes server version: v1.35.2-eks-f69f56f
Controller Logs
https://gist.github.com/jgutierrezglez/c2889a9444c3efbe4d6b4908642d8430
Runner Pod Logs
The listener pod logs from the crashing containers are not recoverable. ARC's controller deletes and recreates the listener pod on each termination — once deleted, Kubernetes does not retain the previous container's logs for the new pod. The controller logs in the gist above are the primary evidence of the crash-loop behavior.
The currently-running listener pod logs (same ARC version, stable at the moment) are also included in the gist and confirm scaleset@v0.3.0 is in use, along with the polling loop pattern that crashes on EOF.
Checks
Controller Version
0.14.1
Deployment Method
Helm
To Reproduce
gha-runner-scale-set-controllerchart)arc-systemsnamespace across one or moreAutoscalingRunnerSetresourcesDescribe the bug
After upgrading ARC from v0.14.0 to v0.14.1, listener pods across multiple runner scale sets enter a continuous crash-loop. The controller repeatedly logs:
The listener exits with a non-zero status code when it receives an EOF response from the GitHub Actions broker service (
broker.actions.githubusercontent.com). This was introduced by thescalesetlibrary upgrade from v0.2.0 → v0.3.0, specifically the "Restore job acquisition flow" change. In v0.3.0, an EOF from the broker is treated as a fatal error instead of being handled gracefully with a retry.The listener logs confirm
scaleset@v0.3.0is in use:After the listener exits, the ARC controller recreates the pod, which crashes again on the next EOF — resulting in an endless restart loop. From our controller logs, we observed 4 listener pods across 2 production clusters accumulating 36 error terminations within a ~40-minute window.
Note:
scalesetv0.4.0 does not fix this. It addresses a different (message ordering) issue. The EOF-on-exit regression is still present in v0.4.0.Describe the expected behavior
The listener should treat EOF responses from the GitHub broker as transient/recoverable conditions and retry — as it did in
scalesetv0.2.0 (ARC v0.14.0). It should not exit with a non-zero status code on EOF.Additional Context
Kubernetes server version:
v1.35.2-eks-f69f56fController Logs
https://gist.github.com/jgutierrezglez/c2889a9444c3efbe4d6b4908642d8430
Runner Pod Logs
The listener pod logs from the crashing containers are not recoverable. ARC's controller deletes and recreates the listener pod on each termination — once deleted, Kubernetes does not retain the previous container's logs for the new pod. The controller logs in the gist above are the primary evidence of the crash-loop behavior.
The currently-running listener pod logs (same ARC version, stable at the moment) are also included in the gist and confirm
scaleset@v0.3.0is in use, along with the polling loop pattern that crashes on EOF.