Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

webSocket auto reconnect doesn't work if socket is closed #2325

Closed
1 task done
yivlad opened this issue May 28, 2024 · 23 comments
Closed
1 task done

webSocket auto reconnect doesn't work if socket is closed #2325

yivlad opened this issue May 28, 2024 · 23 comments
Labels
Good First Issue Misc: Good First Issue

Comments

@yivlad
Copy link

yivlad commented May 28, 2024

Check existing issues

Viem Version

2.13.1

Current Behavior

We use webSocket transport in our frontend app. After some period of inactivity browsers automatically close the web socket connections. When a user leaves the tab open and gets back after a while, webSocket transport is unable to recover from the socket CLOSED state and requests keep failing.

Expected Behavior

webSocket transport creates new connection in place of the closed one.

Steps To Reproduce

Minimal reproducible example using only viem:

import { createPublicClient, parseAbi, webSocket } from "viem";

const client = createPublicClient({
  transport: webSocket(
    "wss://mainnet.infura.io/ws/v3/84842078b09946638c03157f83405213"
  ),
});

const blockNumber = await client.readContract({
  abi: parseAbi(["function getBlockNumber() view returns (uint256)"]),
  address: "0xca11bde05977b3631167028862be2a173976ca11",
  functionName: "getBlockNumber",
});

console.log({
  blockNumber
});

(await client.transport.getRpcClient()).socket.close(); // simulate closing by browser

await client.readContract({
  abi: parseAbi(["function getBlockNumber() view returns (uint256)"]),
  address: "0xca11bde05977b3631167028862be2a173976ca11",
  functionName: "getBlockNumber",
});

In the provided code the second request fails because the web socket connection is closed.

Link to Minimal Reproducible Example

No response

Anything else?

No response

@badgerdf
Copy link

badgerdf commented Jun 3, 2024

Same issue on backend side (nest.js framework). Moreover, viem doesn't notify about socket disconnect/reconnect by logs, just looks like that websocket is just stuck without any reason.

Here is client creation code.

createPublicClient({
      chain: Viem.utils.extractChain(CONFIG.CHAIN_ID),
      transport: webSocket(CONFIG.RPC.WS, {
        reconnect: true,
        retryCount: Infinity,
        retryDelay: 1000,
        timeout: 1000
      }),
    })

@jxom jxom added the Good First Issue Misc: Good First Issue label Jun 8, 2024
@0x33dm
Copy link
Contributor

0x33dm commented Jul 18, 2024

I have the same problem, moreover i didn't find a way of being "notified" ( like an eventListener ) about the connection status.

@flux0uz
Copy link

flux0uz commented Jul 19, 2024

Same problem for us, impossible to index via websocket, no disconnection errors and no automatic reconnection. This is really annoying, because we're going to have to listen via http and explode our alchemy quotas!
Is a fix in the works?

@0x33dm
Copy link
Contributor

0x33dm commented Jul 19, 2024

Same problem for us, impossible to index via websocket, no disconnection errors and no automatic reconnection. This is really annoying, because we're going to have to listen via http and explode our alchemy quotas! Do you know if a fix is in the works?

You can increase the number of reconnect attempts (https://viem.sh/docs/clients/transports/websocket.html), but I don't see a way to be notified of the connection status..

the only way i can think of indexing all blocks with viem would be to go chunk by chunk using http requests.

@izayl
Copy link
Contributor

izayl commented Jul 22, 2024

in my case, it because the backend nginx timeout setting

when I subscribe a topic with indexed topics, it may have long time without response, if backend setting a max timeout, the socket will be closed, return code 1006('Abnormal Closure')

there are two way to resolve this issue:

  1. verify all the CloseEvent code, apply onError logic first to reuse current reconnect logic. Yes, it will not trigger onerror like above timeout case

  2. implement keepalive

I just implement the keep-alive feature: #2516

it will keep send ping to ws server every 30s, and dev can set any value that less than your backend timeout setting, to make sure ws keep-alive

happy to receive some feedbacks

@0x33dm
Copy link
Contributor

0x33dm commented Jul 22, 2024

happy to receive some feedbacks

One thing I miss/couldn't find is a way to subscribe to connection events, like "connected," "disconnected," or "error," in order to make the application more self-aware.

During my initial tests with WS transport, I found that WS would disconnect and fail silently, and my application would stop.

i think either i'm missing something from the documentation or there is room for improvement.

@izayl
Copy link
Contributor

izayl commented Jul 23, 2024

happy to receive some feedbacks

One thing I miss/couldn't find is a way to subscribe to connection events, like "connected," "disconnected," or "error," in order to make the application more self-aware.

During my initial tests with WS transport, I found that WS would disconnect and fail silently, and my application would stop.

i think either i'm missing something from the documentation or there is room for improvement.

I think you can get the socket first, and subscribe by yourself for advanced usage

const WS_URL = ''
const webSocketClient = createPublicClient({
  chain: mainnet,
  transport: webSocket(WS_URL)
})

const socketClient = await getWebSocketRpcClient(WS_URL)
const socket = socketClient.socket

@0x33dm
Copy link
Contributor

0x33dm commented Jul 23, 2024

getWebSocketRpcClient

interesting i didn't find docs for this function on viem website, but my VS CODE finds it.

thank you.

@jxom
Copy link
Member

jxom commented Jul 26, 2024

Fixed via 44281e8

@jxom jxom closed this as completed Jul 26, 2024
@0x33dm
Copy link
Contributor

0x33dm commented Jul 26, 2024

Fixed via 44281e8

^ amazing


Would be possible to also add documentation on how to listen for connection status on the websocket so services that rely on monitoring each block can recover quickly?

From @izayl i got the hint to look for getWebSocketRpcClient but i havent' played with it yet.

thank you

@flux0uz
Copy link

flux0uz commented Jul 27, 2024

Hey @jxom

It seems that the problem persists, we updated our staging environment yesterday and this morning the events are not caught... So the websocket connection must have closed...
No errors on the watchContractEvent onError callback.

@izayl
Copy link
Contributor

izayl commented Jul 27, 2024

@flux0uz u can try with following code to verify the ws close reason first, see what's the code of close event

replace the ws_rpc_url to yourselfs

if (publicClient.transport.type === 'webSocket') {
  const client = await getWebSocketRpcClient(ws_rpc_url!)
  const socket = client.socket
  socket.addEventListener('close', (e) => {
    console.error('websocket closed', e)
  })
}

@flux0uz
Copy link

flux0uz commented Jul 28, 2024

Thanks @izayl, no errors were detected on the socket "close" event. I've just realised that the problem is with the fallback([]). By removing the fallback on the public provider used to listen to events, everything works normally!

@jxom
The fallback option for websocket transport is breaking the auto-reconnect and not throwing any errors.

@flux0uz
Copy link

flux0uz commented Jul 29, 2024

Back to this issue! Another test this morning, listening to the sockets close (with @izayl code). No close log, and we don't receive any events. PublicProvider with a single websocket connection via Alchemy and no fallback.

@izayl
Copy link
Contributor

izayl commented Jul 29, 2024

Back to this issue! Another test this morning, listening to the sockets close (with @izayl code). No close log, and we don't receive any events. PublicProvider with a single websocket connection via Alchemy and no fallback.

double check your ABI, maybe has wrong order or missed indexed

@flux0uz
Copy link

flux0uz commented Jul 29, 2024

Nop works correctly with http transport, but uses a ton of our Alchemy quota!
What's more, it works perfectly for a while just after deployment - we wait about 1 day and then nothing is caught!

@flux0uz
Copy link

flux0uz commented Jul 29, 2024

I'm now listening to message on the dev environment, the first one is OK then I keep getting :
[Symbol(kData)]: '{"id":null,"jsonrpc":"2.0","error":{"code":-32700,"message":"Parse error"}}'

@justefg
Copy link

justefg commented Jul 29, 2024

i confirm. it doesn't work. please run a minimum reproducible for at least 2 hours before merging.

@justefg
Copy link

justefg commented Jul 29, 2024

You might want to take a look at alchemy's ws implementation. tested it and its rock solid with no connection breaking. Looking at your fix why do use ping if it's not supported by an json rpc api? I suggest using net_version or chainId like alchemy does
https://github.com/alchemyplatform/alchemy-sdk-js/blob/master/src/api/alchemy-websocket-provider.ts#L746-L757

rpc endpoints:
https://docs.alchemy.com/reference/net-version
https://www.quicknode.com/docs/ethereum/net_version

@justefg
Copy link

justefg commented Jul 30, 2024

Sorry for a bit of spamming but I really need this feature as it's blocking us from moving forward. This seems to be a common issue for web3 libraries -- ethers having the same issue and no ETA. Web3 js claims to have resolved the issue recently.
web3/web3.js#6968

If you need my help I'll be glad to assist.

@jxom
Copy link
Member

jxom commented Jul 30, 2024

I published a release this morning which makes socket closure handling a bit more robust.

If that doesn't solve your issue, please favor opening a new issue with a minimal reproducible example over replying to this closed issue.

@justefg
Copy link

justefg commented Jul 30, 2024

Unfortunately, it still doesn't work. I created another ticket with steps on how to reproduce the issue.
#2563

Copy link
Contributor

This issue has been locked since it has been closed for more than 14 days.

If you found a concrete bug or regression related to it, please open a new bug report with a reproduction against the latest Viem version. If you have any questions or comments you can create a new discussion thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Good First Issue Misc: Good First Issue
Projects
None yet
Development

No branches or pull requests

8 participants
@jxom @justefg @izayl @yivlad @flux0uz @badgerdf @0x33dm and others