Skip to content

fix: resolve GitHub Copilot "Retry Connection" failing with "Agent service not initialized" (#8925)#8976

Merged
mscolnick merged 3 commits into
mainfrom
ms/reconnect-copilot
Apr 1, 2026
Merged

fix: resolve GitHub Copilot "Retry Connection" failing with "Agent service not initialized" (#8925)#8976
mscolnick merged 3 commits into
mainfrom
ms/reconnect-copilot

Conversation

@mscolnick

Copy link
Copy Markdown
Contributor

Two root causes made Copilot connections unreliable:

  1. Initialize timeout clamped to 5s — The LSP initialize request needs up
    to 30s, but LazyWebsocketTransport.sendData() clamped all timeouts to
    maxTimeoutMs (5000ms). Changed to use maxTimeoutMs only as a default
    when no timeout is provided, so the base class's 30s initialize timeout is
    respected.

  2. Retry didn't re-initialize the LSP — After a connection failure,
    client.close() sets the transport's delegate to undefined. Clicking
    "Retry Connection" reconnected the websocket but never re-sent the LSP
    initialize handshake, so Copilot responded with "Agent service not
    initialized". Added an onReconnect callback on the transport that the
    client uses to re-run initialize() + sendConfiguration() after
    reconnecting.

Closes #8925

…rvice not initialized" (#8925)

Two root causes made Copilot connections unreliable:

1. **Initialize timeout clamped to 5s** — The LSP `initialize` request needs up
   to 30s, but `LazyWebsocketTransport.sendData()` clamped all timeouts to
   `maxTimeoutMs` (5000ms). Changed to use `maxTimeoutMs` only as a default
   when no timeout is provided, so the base class's 30s initialize timeout is
   respected.

2. **Retry didn't re-initialize the LSP** — After a connection failure,
   `client.close()` sets the transport's delegate to `undefined`. Clicking
   "Retry Connection" reconnected the websocket but never re-sent the LSP
   `initialize` handshake, so Copilot responded with "Agent service not
   initialized". Added an `onReconnect` callback on the transport that the
   client uses to re-run `initialize()` + `sendConfiguration()` after
   reconnecting.

Closes #8925
Copilot AI review requested due to automatic review settings April 1, 2026 18:32
@vercel

vercel Bot commented Apr 1, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Apr 1, 2026 6:50pm

Request Review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes unreliable GitHub Copilot LSP connections in the CodeMirror integration by ensuring the LSP initialize request can use longer timeouts and by re-running initialization/configuration after websocket reconnection so “Retry Connection” succeeds.

Changes:

  • Stop clamping all JSON-RPC timeouts to maxTimeoutMs; use it only as a default when no timeout is provided.
  • Add an onReconnect hook to LazyWebsocketTransport and wire it to re-run LSP initialization + configuration after reconnect.
  • Extend transport unit tests to cover the new timeout semantics and onReconnect behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
frontend/src/core/codemirror/copilot/transport.ts Adds reconnect-triggered reinitialization hook and updates timeout handling.
frontend/src/core/codemirror/copilot/language-server.ts Adds reInitialize() helper to rerun initialize/config steps.
frontend/src/core/codemirror/copilot/client.ts Wires transport reconnection to client re-initialization.
frontend/src/core/codemirror/copilot/tests/transport.test.ts Updates/expands tests for new timeout and reconnect behavior.

Comment on lines +217 to +221
this.needsReInitialization = false;
Logger.log(
"Copilot#sendData: Re-initializing LSP after reconnection...",
);
await this.onReconnect();

Copilot AI Apr 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needsReInitialization is set to false before awaiting onReconnect(). If onReconnect throws/rejects (e.g., initialize fails transiently), the transport will think it’s re-initialized and won’t retry on subsequent sendData, leaving the client stuck until another close() or full reload. Consider only clearing the flag after onReconnect succeeds (or restoring it in a catch/finally).

Suggested change
this.needsReInitialization = false;
Logger.log(
"Copilot#sendData: Re-initializing LSP after reconnection...",
);
await this.onReconnect();
Logger.log(
"Copilot#sendData: Re-initializing LSP after reconnection...",
);
try {
await this.onReconnect();
this.needsReInitialization = false;
} catch (error) {
Logger.error(
"Copilot#sendData: LSP re-initialization after reconnection failed",
error,
);
throw error;
}

Copilot uses AI. Check for mistakes.
Comment on lines 201 to 223
// If delegate is undefined, try to reconnect
if (!this.delegate) {
Logger.log("Copilot#sendData: Delegate not initialized, reconnecting...");
try {
// Ensure prerequisites are ready before attempting connection
await this.options.waitForReady();
await this.tryConnect();
} catch (error) {
Logger.error("Copilot#sendData: Failed to reconnect transport", error);
throw new Error(
"Unable to connect to GitHub Copilot. Please check your settings and try again.",
);
}

// Re-run LSP initialization handshake after reconnecting
if (this.needsReInitialization && this.onReconnect) {
this.needsReInitialization = false;
Logger.log(
"Copilot#sendData: Re-initializing LSP after reconnection...",
);
await this.onReconnect();
}
}

Copilot AI Apr 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-initialization is only performed inside the if (!this.delegate) reconnect path. If multiple sendData calls happen concurrently during a reconnect, one call can create/connect the delegate while onReconnect() is still running, and another call may observe delegate as defined and proceed to send requests before the LSP initialize handshake completes (risking the same “Agent service not initialized” errors). Consider guarding reconnect/re-init with a shared promise/lock (similar to connectionPromise in ReconnectingWebSocketTransport) and/or ensuring needsReInitialization is checked even when delegate is already set.

Copilot uses AI. Check for mistakes.
await transport.sendData(data, 5000);

expect(onReconnect).toHaveBeenCalledTimes(1);
});

Copilot AI Apr 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new onReconnect behavior isn’t exercised for failure cases. In ReconnectingWebSocketTransport there’s coverage asserting onReconnect rejections propagate; adding a similar test here would help prevent regressions (especially around retrying when onReconnect fails).

Suggested change
});
});
it("should propagate onReconnect rejection during reconnection via sendData", async () => {
const transport = new LazyWebsocketTransport({
getWsUrl: mockGetWsUrl,
waitForReady: mockWaitForReady,
showError: mockShowError,
});
const reconnectError = new Error("onReconnect failed");
const onReconnect = vi.fn().mockRejectedValue(reconnectError);
transport.onReconnect = onReconnect;
// Initial connect should not trigger onReconnect
await transport.connect();
expect(onReconnect).not.toHaveBeenCalled();
// Close to force reconnection on next sendData
transport.close();
const data: any = { method: "test", params: [] };
// When reconnection occurs, onReconnect rejection should propagate
await expect(transport.sendData(data, 5000)).rejects.toThrow("onReconnect failed");
expect(onReconnect).toHaveBeenCalledTimes(1);
});

Copilot uses AI. Check for mistakes.
@mscolnick mscolnick added the bug Something isn't working label Apr 1, 2026
- Move needsReInitialization=false after onReconnect succeeds so a
  failed re-init can be retried on the next sendData call
- Close the uninitialized delegate on onReconnect failure so the next
  attempt starts with a fresh connection
- Update maxTimeoutMs JSDoc to reflect it's now a default, not a clamp
- Add test for onReconnect rejection propagation and retry
- Cast ReconnectingWebSocket to IConnectionTransport to fix readyState
  type mismatch (number vs 0|1|2|3)
- Remove @ts-expect-error for Uint8Array.fromBase64/toBase64 since tsgo
  with ESNext lib recognizes these natively
- Use `"toBase64" in Uint8Array.prototype` instead of truthiness check
  to avoid "condition always true" error
- Use this.close() in transport catch block to avoid narrowing issue

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

frontend/src/core/codemirror/copilot/transport.ts:44

  • maxTimeoutMs is now documented/used as a default timeout rather than a maximum, but the option name still reads like an upper bound. This is easy to misinterpret for future callers; consider renaming (with a backwards-compatible alias) or clarifying in the JSDoc that the name is historical and no longer clamps caller-provided timeouts.
  /**
   * Default timeout for sendData operations in milliseconds.
   * Used when the caller does not provide an explicit timeout.
   * @default 5000
   */
  maxTimeoutMs?: number;
}

Comment on lines 33 to 44
@@ -38,7 +40,7 @@ function createConnectionTransport(
// long timeout -- the server can become slow when many notebooks
// are open.
connectionTimeout: 10_000,
});
}) as unknown as IConnectionTransport;
}

Copilot AI Apr 1, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The as unknown as IConnectionTransport cast disables structural type checking for the whole transport instance, which can mask future API drift (e.g., if partysocket/ws changes the reconnect signature). If the only incompatibility is readyState being typed as number, consider a narrower approach (e.g., a small wrapper or adjusting IConnectionTransport.readyState typing) so other members remain type-checked.

Copilot uses AI. Check for mistakes.
@mscolnick mscolnick merged commit 7c3cc60 into main Apr 1, 2026
31 of 33 checks passed
@mscolnick mscolnick deleted the ms/reconnect-copilot branch April 1, 2026 19:14
@github-actions

github-actions Bot commented Apr 1, 2026

Copy link
Copy Markdown

🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.22.1-dev20

VishakBaddur pushed a commit to VishakBaddur/marimo that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Marimo sporadically fails to connect to GitHub Copilot

2 participants