Skip to content

--wait hangs for the full timeout on fast resource deletions in v4.2.1 (regression from v4.2.0) #32214

@walsha2

Description

@walsha2

What happened?

After upgrading from Helm v4.2.0 to v4.2.1, any helm upgrade --install --wait that deletes a resource and waits for it to be gone sits there for the entire --timeout instead of returning as soon as the resource is actually deleted.

It does not error out. After the full timeout elapses it just continues and the release reports success. So an install that used to take a couple of minutes now takes 10+ minutes per deletion wait, but still exits 0.

I hit this with hook resources that use helm.sh/hook-delete-policy: before-hook-creation. On each upgrade Helm deletes the old hook object and waits for it to disappear before recreating it. The object is gone almost immediately, but the wait runs the full clock.

What did you expect to happen?

It should not be hanging.

How can we reproduce it (as minimally and precisely as possible)?

Evidence

Same cluster, same chart, same hooks. The only thing that changed between the two runs is the Helm patch version.

v4.2.0 (works, deletion detected in ~0.1s):

level=DEBUG msg="waiting for resources to be deleted" count=1 timeout=10m0s
level=DEBUG msg="all resources achieved desired status" desiredStatus=NotFound resourceCount=0

v4.2.1 (hangs the full 10m, then proceeds):

08:14:35  level=DEBUG msg="starting delete resource" ... kind=ClusterIssuer
08:14:35  level=DEBUG msg="waiting for resources to be deleted" count=1 timeout=10m0s
08:24:35  level=DEBUG msg="updating release" ...                 <-- exactly 10 minutes later
08:24:36  level=DEBUG msg="Created resource via patch" ...

Three before-hook-creation hooks in a row, each waiting the full 10m, turned a ~2 minute install into a ~31 minute one. None of the individual waits exceeded --timeout, so the command never failed, it just crawled.

Root cause

This looks like a side effect of #32081 (merged May 30, shipped in v4.2.1), which changed WaitForDelete / statusObserver in the kube package.

That PR was fixing a flaky test: during informer sync, resources briefly show as Unknown, and the old observer would cancel the watch too early on that, causing intermittent timeouts in the full test suite. The fix defers the early-cancel decision when every resource is Unknown and nothing has a definitive status yet, and it treats "timeout while everything is Unknown or NotFound" as not-an-error.

The problem is that a quick deletion looks exactly like that case. The resource really is gone, but the watcher reports Unknown/NotFound during sync, so the new logic defers instead of recognizing the deletion and waits until the context deadline (the --timeout). When the deadline hits with everything NotFound, it is treated as success rather than failure, which is why the wait silently eats the full timeout and then continues.

So it fixed a premature-exit race in tests and introduced a delayed-exit latency bug in real deletions. v4.2.0 exited in ~0.1s, v4.2.1 waits the entire timeout.

Reproduce

  1. Install a chart with a hook that uses helm.sh/hook-delete-policy: before-hook-creation.
  2. helm upgrade --install --wait --timeout 10m <release> <chart> a second time so the old hook object gets deleted and recreated.
  3. Watch the deletion wait sit for the full 10m on v4.2.1. On v4.2.0 it returns immediately.

Workaround

Pin to v4.2.0. That restores the fast behavior with no other changes.

References

Helm version

Details
$ helm version
version.BuildInfo{Version:"v4.2.1", GitCommit:"d591a19b953bd9cfdf7d9ddd83c2f4ffdaeafb29", GitTreeState:"clean", GoVersion:"go1.26.4", KubeClientVersion:"v1.36"}

Kubernetes version

Details
$ kubectl version
Client Version: v1.33.1
Kustomize Version: v5.6.0
Server Version: v1.34.8

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugCategorizes issue or PR as related to a bug.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions