Huge Performance Drop observed with large payloads of around 300KB or more #702
Unanswered
YashasAnand
asked this question in
Q&A
Replies: 2 comments 8 replies
-
Hey! On what machines the cluster is running? |
Beta Was this translation helpful? Give feedback.
1 reply
-
this might be an issue. what do you see if you remove - _ = Task.Run(async () => await ProcessMessageAsync(natsConsumerOptions,consumerOptions,onMessageReceived,jsMsg));;
+ await ProcessMessageAsync(natsConsumerOptions,consumerOptions,onMessageReceived,jsMsg); |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am doing a benchmark at around 300rps with a large payload of around 300KB in k8s nats using helm. My client code is written in c#.
This is my producer code:
NatsMessageModel.cs
Consumer Code:
The payload i am sending is around 300KB, same if i send small payloads around 1KB or 10KB we see very fast consumption & produc rates. can the above code be optimized or any thing specific we can do regarding this? Also ive observed stream size grows exponentially in this case,
The CPU & memory on nodes are going very high around 90% on each nats node given a 3 node nats jetstream cluster with current Replica set to 1 ( temporarily for testsing ,later will increase replicas).
Nats was installed using helm chart as suggested in the cmminity github https://github.com/nats-io/k8s
Edit 1:
@Jarema pls find metrics below:
K8s cluster consisting of 3 node nats deployed through official helm chart.
-----------------------------------------------------TEST Case 1---------------------------------------------------------------------------------
Test Parameters:
PayloadSize: 1KB
RPS: 1.5k
Test Duration : 10m
Stream Replicas: 3 (file storage)
Node Config: (2 Core 4GB Ram with 1:1 cpu to vcpu ratio & 20GB Storage)
Consumer Info:
We are using 75 Consumers totally of JobTestTopic type per pod & we have 3 pods of consumer service (totally 225 consumers) , 40 of these topics are unique durable consumers example (JobTestTopic1, JobTestTopic2 .. JobTestTopic39) . other 35 consumers are clients bounded to same consumers & jetstream is distributing it in round-robin fashion to different consumer threads.
Messags produced are round-robined to each consumer using custom algorithm to distribute load to all consumers.
Observations
CPU of 3 NATS Nodes: are around 90%
Service (Producer & consumer) CPUs are around 60 to 70%
NOTE: we are able to rate of produce & rate of consume is almost equal, due to the fact that processing & acking is happening on a different thread in consumer as shown above.
Here is the stream & consumer info
These are consumer info of 2 consumers
-----------------------------------------------------TEST Case 2---------------------------------------------------------------------------------
Exact same test as above but freshly created stream, with payload size of 270KB at around 300rps
Test Parameters:
PayloadSize: 270KB
RPS: 300rps
Test Duration : 10m
Stream Replicas: 3 (file storage)
Node Config: (2 Core 4GB Ram with 1:1 cpu to vcpu ratio & 20GB Storage)
Consumer Info:
We are using 75 Consumers totally of JobTestTopic type per pod & we have 3 pods of consumer service (totally 225 consumers) , 40 of these topics are unique durable consumers example (JobTestTopic1, JobTestTopic2 .. JobTestTopic39) . other 35 consumers are clients bounded to same consumers & jetstream is distributing it in round-robin fashion to different consumer threads.
Messags produced are round-robined to each consumer using custom algorithm to distribute load to all consumers.
Observations
CPU of 3 NATS Nodes: are around 85 to 100%
Service (Producer & consumer) CPUs are around 35 to 50%
Observed pod Restarts of NATS, Also produce errors in producer service (logs attached later section)
Able to reach only 50RPS
Stream config
Consumer Info:
Before Pod Restart These Logs are observed:
Producer Service Exceptions: Also i feel for some reason this No response received from the server is happeing due to overwhelming the nats server because i am able to see consumers still consuming slowly. Need your advice on this...
Beta Was this translation helpful? Give feedback.
All reactions