What happened?
After upgrading from Helm v4.2.0 to v4.2.1, any helm upgrade --install --wait that deletes a resource and waits for it to be gone sits there for the entire --timeout instead of returning as soon as the resource is actually deleted.
It does not error out. After the full timeout elapses it just continues and the release reports success. So an install that used to take a couple of minutes now takes 10+ minutes per deletion wait, but still exits 0.
I hit this with hook resources that use helm.sh/hook-delete-policy: before-hook-creation. On each upgrade Helm deletes the old hook object and waits for it to disappear before recreating it. The object is gone almost immediately, but the wait runs the full clock.
What did you expect to happen?
It should not be hanging.
How can we reproduce it (as minimally and precisely as possible)?
Evidence
Same cluster, same chart, same hooks. The only thing that changed between the two runs is the Helm patch version.
v4.2.0 (works, deletion detected in ~0.1s):
level=DEBUG msg="waiting for resources to be deleted" count=1 timeout=10m0s
level=DEBUG msg="all resources achieved desired status" desiredStatus=NotFound resourceCount=0
v4.2.1 (hangs the full 10m, then proceeds):
08:14:35 level=DEBUG msg="starting delete resource" ... kind=ClusterIssuer
08:14:35 level=DEBUG msg="waiting for resources to be deleted" count=1 timeout=10m0s
08:24:35 level=DEBUG msg="updating release" ... <-- exactly 10 minutes later
08:24:36 level=DEBUG msg="Created resource via patch" ...
Three before-hook-creation hooks in a row, each waiting the full 10m, turned a ~2 minute install into a ~31 minute one. None of the individual waits exceeded --timeout, so the command never failed, it just crawled.
Root cause
This looks like a side effect of #32081 (merged May 30, shipped in v4.2.1), which changed WaitForDelete / statusObserver in the kube package.
That PR was fixing a flaky test: during informer sync, resources briefly show as Unknown, and the old observer would cancel the watch too early on that, causing intermittent timeouts in the full test suite. The fix defers the early-cancel decision when every resource is Unknown and nothing has a definitive status yet, and it treats "timeout while everything is Unknown or NotFound" as not-an-error.
The problem is that a quick deletion looks exactly like that case. The resource really is gone, but the watcher reports Unknown/NotFound during sync, so the new logic defers instead of recognizing the deletion and waits until the context deadline (the --timeout). When the deadline hits with everything NotFound, it is treated as success rather than failure, which is why the wait silently eats the full timeout and then continues.
So it fixed a premature-exit race in tests and introduced a delayed-exit latency bug in real deletions. v4.2.0 exited in ~0.1s, v4.2.1 waits the entire timeout.
Reproduce
- Install a chart with a hook that uses
helm.sh/hook-delete-policy: before-hook-creation.
helm upgrade --install --wait --timeout 10m <release> <chart> a second time so the old hook object gets deleted and recreated.
- Watch the deletion wait sit for the full 10m on v4.2.1. On v4.2.0 it returns immediately.
Workaround
Pin to v4.2.0. That restores the fast behavior with no other changes.
References
Helm version
Details
$ helm version
version.BuildInfo{Version:"v4.2.1", GitCommit:"d591a19b953bd9cfdf7d9ddd83c2f4ffdaeafb29", GitTreeState:"clean", GoVersion:"go1.26.4", KubeClientVersion:"v1.36"}
Kubernetes version
Details
$ kubectl version
Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.34.8
What happened?
After upgrading from Helm v4.2.0 to v4.2.1, any
helm upgrade --install --waitthat deletes a resource and waits for it to be gone sits there for the entire--timeoutinstead of returning as soon as the resource is actually deleted.It does not error out. After the full timeout elapses it just continues and the release reports success. So an install that used to take a couple of minutes now takes 10+ minutes per deletion wait, but still exits 0.
I hit this with hook resources that use
helm.sh/hook-delete-policy: before-hook-creation. On each upgrade Helm deletes the old hook object and waits for it to disappear before recreating it. The object is gone almost immediately, but the wait runs the full clock.What did you expect to happen?
It should not be hanging.
How can we reproduce it (as minimally and precisely as possible)?
Evidence
Same cluster, same chart, same hooks. The only thing that changed between the two runs is the Helm patch version.
v4.2.0 (works, deletion detected in ~0.1s):
v4.2.1 (hangs the full 10m, then proceeds):
Three
before-hook-creationhooks in a row, each waiting the full 10m, turned a ~2 minute install into a ~31 minute one. None of the individual waits exceeded--timeout, so the command never failed, it just crawled.Root cause
This looks like a side effect of #32081 (merged May 30, shipped in v4.2.1), which changed
WaitForDelete/statusObserverin thekubepackage.That PR was fixing a flaky test: during informer sync, resources briefly show as
Unknown, and the old observer would cancel the watch too early on that, causing intermittent timeouts in the full test suite. The fix defers the early-cancel decision when every resource isUnknownand nothing has a definitive status yet, and it treats "timeout while everything isUnknownorNotFound" as not-an-error.The problem is that a quick deletion looks exactly like that case. The resource really is gone, but the watcher reports
Unknown/NotFoundduring sync, so the new logic defers instead of recognizing the deletion and waits until the context deadline (the--timeout). When the deadline hits with everythingNotFound, it is treated as success rather than failure, which is why the wait silently eats the full timeout and then continues.So it fixed a premature-exit race in tests and introduced a delayed-exit latency bug in real deletions. v4.2.0 exited in ~0.1s, v4.2.1 waits the entire timeout.
Reproduce
helm.sh/hook-delete-policy: before-hook-creation.helm upgrade --install --wait --timeout 10m <release> <chart>a second time so the old hook object gets deleted and recreated.Workaround
Pin to v4.2.0. That restores the fast behavior with no other changes.
References
Helm version
Details
Kubernetes version
Details