fix: resolve GitHub Copilot "Retry Connection" failing with "Agent service not initialized" (#8925)#8976
Conversation
…rvice not initialized" (#8925) Two root causes made Copilot connections unreliable: 1. **Initialize timeout clamped to 5s** — The LSP `initialize` request needs up to 30s, but `LazyWebsocketTransport.sendData()` clamped all timeouts to `maxTimeoutMs` (5000ms). Changed to use `maxTimeoutMs` only as a default when no timeout is provided, so the base class's 30s initialize timeout is respected. 2. **Retry didn't re-initialize the LSP** — After a connection failure, `client.close()` sets the transport's delegate to `undefined`. Clicking "Retry Connection" reconnected the websocket but never re-sent the LSP `initialize` handshake, so Copilot responded with "Agent service not initialized". Added an `onReconnect` callback on the transport that the client uses to re-run `initialize()` + `sendConfiguration()` after reconnecting. Closes #8925
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
Fixes unreliable GitHub Copilot LSP connections in the CodeMirror integration by ensuring the LSP initialize request can use longer timeouts and by re-running initialization/configuration after websocket reconnection so “Retry Connection” succeeds.
Changes:
- Stop clamping all JSON-RPC timeouts to
maxTimeoutMs; use it only as a default when no timeout is provided. - Add an
onReconnecthook toLazyWebsocketTransportand wire it to re-run LSP initialization + configuration after reconnect. - Extend transport unit tests to cover the new timeout semantics and
onReconnectbehavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| frontend/src/core/codemirror/copilot/transport.ts | Adds reconnect-triggered reinitialization hook and updates timeout handling. |
| frontend/src/core/codemirror/copilot/language-server.ts | Adds reInitialize() helper to rerun initialize/config steps. |
| frontend/src/core/codemirror/copilot/client.ts | Wires transport reconnection to client re-initialization. |
| frontend/src/core/codemirror/copilot/tests/transport.test.ts | Updates/expands tests for new timeout and reconnect behavior. |
| this.needsReInitialization = false; | ||
| Logger.log( | ||
| "Copilot#sendData: Re-initializing LSP after reconnection...", | ||
| ); | ||
| await this.onReconnect(); |
There was a problem hiding this comment.
needsReInitialization is set to false before awaiting onReconnect(). If onReconnect throws/rejects (e.g., initialize fails transiently), the transport will think it’s re-initialized and won’t retry on subsequent sendData, leaving the client stuck until another close() or full reload. Consider only clearing the flag after onReconnect succeeds (or restoring it in a catch/finally).
| this.needsReInitialization = false; | |
| Logger.log( | |
| "Copilot#sendData: Re-initializing LSP after reconnection...", | |
| ); | |
| await this.onReconnect(); | |
| Logger.log( | |
| "Copilot#sendData: Re-initializing LSP after reconnection...", | |
| ); | |
| try { | |
| await this.onReconnect(); | |
| this.needsReInitialization = false; | |
| } catch (error) { | |
| Logger.error( | |
| "Copilot#sendData: LSP re-initialization after reconnection failed", | |
| error, | |
| ); | |
| throw error; | |
| } |
| // If delegate is undefined, try to reconnect | ||
| if (!this.delegate) { | ||
| Logger.log("Copilot#sendData: Delegate not initialized, reconnecting..."); | ||
| try { | ||
| // Ensure prerequisites are ready before attempting connection | ||
| await this.options.waitForReady(); | ||
| await this.tryConnect(); | ||
| } catch (error) { | ||
| Logger.error("Copilot#sendData: Failed to reconnect transport", error); | ||
| throw new Error( | ||
| "Unable to connect to GitHub Copilot. Please check your settings and try again.", | ||
| ); | ||
| } | ||
|
|
||
| // Re-run LSP initialization handshake after reconnecting | ||
| if (this.needsReInitialization && this.onReconnect) { | ||
| this.needsReInitialization = false; | ||
| Logger.log( | ||
| "Copilot#sendData: Re-initializing LSP after reconnection...", | ||
| ); | ||
| await this.onReconnect(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Re-initialization is only performed inside the if (!this.delegate) reconnect path. If multiple sendData calls happen concurrently during a reconnect, one call can create/connect the delegate while onReconnect() is still running, and another call may observe delegate as defined and proceed to send requests before the LSP initialize handshake completes (risking the same “Agent service not initialized” errors). Consider guarding reconnect/re-init with a shared promise/lock (similar to connectionPromise in ReconnectingWebSocketTransport) and/or ensuring needsReInitialization is checked even when delegate is already set.
| await transport.sendData(data, 5000); | ||
|
|
||
| expect(onReconnect).toHaveBeenCalledTimes(1); | ||
| }); |
There was a problem hiding this comment.
The new onReconnect behavior isn’t exercised for failure cases. In ReconnectingWebSocketTransport there’s coverage asserting onReconnect rejections propagate; adding a similar test here would help prevent regressions (especially around retrying when onReconnect fails).
| }); | |
| }); | |
| it("should propagate onReconnect rejection during reconnection via sendData", async () => { | |
| const transport = new LazyWebsocketTransport({ | |
| getWsUrl: mockGetWsUrl, | |
| waitForReady: mockWaitForReady, | |
| showError: mockShowError, | |
| }); | |
| const reconnectError = new Error("onReconnect failed"); | |
| const onReconnect = vi.fn().mockRejectedValue(reconnectError); | |
| transport.onReconnect = onReconnect; | |
| // Initial connect should not trigger onReconnect | |
| await transport.connect(); | |
| expect(onReconnect).not.toHaveBeenCalled(); | |
| // Close to force reconnection on next sendData | |
| transport.close(); | |
| const data: any = { method: "test", params: [] }; | |
| // When reconnection occurs, onReconnect rejection should propagate | |
| await expect(transport.sendData(data, 5000)).rejects.toThrow("onReconnect failed"); | |
| expect(onReconnect).toHaveBeenCalledTimes(1); | |
| }); |
- Move needsReInitialization=false after onReconnect succeeds so a failed re-init can be retried on the next sendData call - Close the uninitialized delegate on onReconnect failure so the next attempt starts with a fresh connection - Update maxTimeoutMs JSDoc to reflect it's now a default, not a clamp - Add test for onReconnect rejection propagation and retry
- Cast ReconnectingWebSocket to IConnectionTransport to fix readyState type mismatch (number vs 0|1|2|3) - Remove @ts-expect-error for Uint8Array.fromBase64/toBase64 since tsgo with ESNext lib recognizes these natively - Use `"toBase64" in Uint8Array.prototype` instead of truthiness check to avoid "condition always true" error - Use this.close() in transport catch block to avoid narrowing issue
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
frontend/src/core/codemirror/copilot/transport.ts:44
maxTimeoutMsis now documented/used as a default timeout rather than a maximum, but the option name still reads like an upper bound. This is easy to misinterpret for future callers; consider renaming (with a backwards-compatible alias) or clarifying in the JSDoc that the name is historical and no longer clamps caller-provided timeouts.
/**
* Default timeout for sendData operations in milliseconds.
* Used when the caller does not provide an explicit timeout.
* @default 5000
*/
maxTimeoutMs?: number;
}
| @@ -38,7 +40,7 @@ function createConnectionTransport( | |||
| // long timeout -- the server can become slow when many notebooks | |||
| // are open. | |||
| connectionTimeout: 10_000, | |||
| }); | |||
| }) as unknown as IConnectionTransport; | |||
| } | |||
There was a problem hiding this comment.
The as unknown as IConnectionTransport cast disables structural type checking for the whole transport instance, which can mask future API drift (e.g., if partysocket/ws changes the reconnect signature). If the only incompatibility is readyState being typed as number, consider a narrower approach (e.g., a small wrapper or adjusting IConnectionTransport.readyState typing) so other members remain type-checked.
|
🚀 Development release published. You may be able to view the changes at https://marimo.app?v=0.22.1-dev20 |
…rvice not initialized" (marimo-team#8925) (marimo-team#8976)
Two root causes made Copilot connections unreliable:
Initialize timeout clamped to 5s — The LSP
initializerequest needs upto 30s, but
LazyWebsocketTransport.sendData()clamped all timeouts tomaxTimeoutMs(5000ms). Changed to usemaxTimeoutMsonly as a defaultwhen no timeout is provided, so the base class's 30s initialize timeout is
respected.
Retry didn't re-initialize the LSP — After a connection failure,
client.close()sets the transport's delegate toundefined. Clicking"Retry Connection" reconnected the websocket but never re-sent the LSP
initializehandshake, so Copilot responded with "Agent service notinitialized". Added an
onReconnectcallback on the transport that theclient uses to re-run
initialize()+sendConfiguration()afterreconnecting.
Closes #8925