Out of memory issue #526

sathibault · 2019-10-18T18:00:17Z

I have a consumer that is consuming massive amounts of memory and being killed by the OS.

I've commented actual processing out, so the consumer is empty:

  await consumer.run({
    eachMessage: async record => {
    }
  });

From strace I see the following:

{"level":"debug","logger":"kafkajs","message":"Request Fetch(key: 1, version: 7)","broker":"stage-health-intelligence-kafka-3:9092","clientId":"boundary-service","correlationId":7,"expectResponse":true,"size":320,"timestamp":"2019-10-18T17:36:04.238Z"}
brk(0x57e4000)                          = 0x57e4000
brk(0x5be4000)                          = 0x5be4000
...
{"level":"debug","logger":"kafkajs","message":"Response Fetch(key: 1, version: 7)","broker":"stage-health-intelligence-kafka-2:9092","clientId":"boundary-service","correlationId":1,"size":1049050,"data":"[filtered]","timestamp":"2019-10-18T17:36:06.999Z"}
brk(0x1610dc000)                        = 0x1610dc000
brk(0x1611ce000)                        = 0x1611ce000

This are between two successive log messages, so the process size is growing from 0x57e4000 to 0x1610dc000 between these two messages.

Any leads on how I can further troubleshoot this?

The text was updated successfully, but these errors were encountered:

sathibault · 2019-10-18T18:01:16Z

I'm using kafkajs 1.11.0 with kafkajs-lz4 1.2.1

JaapRood · 2019-10-21T08:10:00Z

Normally I'd say any out of memory issues are through breaking back-pressure, but that doesn't hold up where memory grows 100x between 2 messages 😅. I don't really have any useful intuition about where else this might be coming from, but here's some suggestions on how I might try to narrow down on the root cause:

I'd try to create a memory profile through Chrome DevTools (by running node --inspect-brk). There's a chance the profiling won't work because of the massive jump in memory, but worth a try, as it should tell you where memory was allocated.
I'd try running against a different topic, possibly without the LZ4 compression, to either rule that out as a factor or highlight it as one.
I'd possibly try a different Node (and thus V8) version (which version are you on, actually). It's rare, but I have seen the occasional V8 bug throughout the years.

@tulios @Nevon have a more intimate knowledge of the internals, so perhaps they can be of more concrete help!

Nevon · 2019-10-21T08:36:40Z

I would strongly suspect lz4 being the culprit. We are running the same version in production (without compression) with stable memory consumption.

While it's not impossible that it's something on our side related to perhaps some specific node version, it feels more likely to be lz4 given that it's a native lib with known memory leaks.

It would be good if you could try without compression and see if you have the same issue, just to verify whether or not the leak is coming from us or not.

sathibault · 2019-10-22T18:03:54Z

Thanks. Unfortunately, I'm unable to disable lz4 on the topic and was unable to spend any more time debugging this. I've migrated to node-rdkafka.

ankon · 2020-06-21T15:00:19Z

While it's not impossible that it's something on our side related to perhaps some specific node version, it feels more likely to be lz4 given that it's a native lib with known memory leaks.

Mostly for the record: I saw this issue today, and found the idea of a leak in lz4 very scary -- we're also using lz4 compressed messages, and we have been seeing memory issues.

But, after some playing it looks like the lz4 leak in the linked issue is actually a problem in the benchmark: There simply wasn't any GCs running. See my comment over there for details.

For us this did bring up something else though: Looking at our batch processing logic I found the same issue as the test had! We would starve our event loop processing because our batch processing was essentially a tight loop of async/await code, and never did we give NodeJS time to process everything else until eventually the batch was processed. We've now fixed this problem by artifically introducing event loop runs (essentially we wrap our per-message processing in a setImmediate), and I'm going to hopefully see memory usage drops now as well.

paras-nference · 2021-04-27T06:32:48Z

@sathibault Hey, I am facing the same issue of memory leaking in my node app and I want to find out whether it's because of kafka only. So, I want to print the process size in error log like you are doing. I couldn't find how to enable this, can you please help me with this?
"level":"debug","logger":"kafkajs", ... ,"size":320, ....}

sathibault closed this as completed Oct 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory issue #526

Out of memory issue #526

sathibault commented Oct 18, 2019

sathibault commented Oct 18, 2019

JaapRood commented Oct 21, 2019

Nevon commented Oct 21, 2019 •

edited

Loading

sathibault commented Oct 22, 2019

ankon commented Jun 21, 2020

paras-nference commented Apr 27, 2021

Out of memory issue #526

Out of memory issue #526

Comments

sathibault commented Oct 18, 2019

sathibault commented Oct 18, 2019

JaapRood commented Oct 21, 2019

Nevon commented Oct 21, 2019 • edited Loading

sathibault commented Oct 22, 2019

ankon commented Jun 21, 2020

paras-nference commented Apr 27, 2021

Nevon commented Oct 21, 2019 •

edited

Loading