Skip to content
This repository has been archived by the owner on Apr 12, 2021. It is now read-only.

Add an option to specify the number of messages to retrieve #19

Open
fplanjer opened this issue May 2, 2019 · 5 comments
Open

Add an option to specify the number of messages to retrieve #19

fplanjer opened this issue May 2, 2019 · 5 comments

Comments

@fplanjer
Copy link

fplanjer commented May 2, 2019

I see that the number of messages that are retrieved in one call is set to 32 (or CloudQueueMessage.MaxNumberOfMessagesToPeek, which I assume is 32 also).

Do you see any issue in extending the functionality to make this number configurable?

@Scooletz
Copy link
Owner

Scooletz commented May 7, 2019

You are correct, currently the maximum number of messages is retrieved as one batch. Let me provide some reasoning against it:

If you can amortize the cost of accessing another service (for instance a blob storage) by gathering as many messages as possible, you should do it to "smart batch" them which would result in a better throughput and lowering costs of running your solution. If for any reason, you know that there's a limit (for instance some kind of a size limit) then, by iterating over the batch in foreach you can easily calculate can you move one forward and acknowledge only these that have been processed so far. There's nothing wrong in issuing another call to the external service with the rest of messages. Does it show why I don't see a need of adjusting it? Maybe you could you provide some reasoning for this change @fplanjer ?

@fplanjer
Copy link
Author

fplanjer commented May 8, 2019

Thanks for your reply and taking the time to explain. Your reasoning makes sense.

The reason we are looking into this is that the actual processing of the message is done by an executable which handles at most 5 messages at a time (it is processing images). The reason for 5 images is unknown to me, but this was identified by the team creating this executable to be an efficient number of items, in terms of cpu, memory, etc.

We could indeed just only process the first 5 messages of the batch and not acknowledge the rest, but this seems to be less efficient than only taking the correct number of message to start with. With the numbers we are looking at we need to be as efficient as possible,

@Scooletz
Copy link
Owner

Scooletz commented May 8, 2019

Ok, now I see it. I'm not sure why would you process only first 5 and discard the rest though. Can't you process 5 from the batch and then take another 5 from the batch? Something that would look like

public static void MyFunc([QueueBatchTrigger("myqueue")] IMessageBatch batch)
{
  foreach (var fiveMessages in batch.Messages.GroupInFives())
  {
    if (TryProcessImages(fiveMessages))
    {
       foreach (var msg in fiveMessages)
       {
          batch.MarkAsProcessed (msg);
       }
    }
  }
}

@tmenier
Copy link

tmenier commented Dec 6, 2019

Thank you for this library. I'm evaluating whether it could be useful for my purposes, and upon reading the readme I think the most obvious question was, where do I set the batch size? If it's fixed at 32 then I think that should at least be mentioned, but I agree with @fplanjer that there are plenty of reasons you'd want to configure it. It just seems reasonable to expect a "batching" library to provide this. In your example that takes 32 and breaks into sub-batches of 5, sure, that works, but imagine how much nicer that code would be if you could just add BatchSize = 5 to the attribute. ;)

In my case, I'm sending data to an external API that has a batch limit of 100, so I'd actually want to increase it. What would be the ultimate killer feature (and I can open this in a new issue if you think you'd consider it) would be the ability to specify both a batch size and a timeout value. This would allow me to specify: "give me up to 100 items, but if a full minute has passed just give me what you have".

Thanks again!

@Scooletz
Copy link
Owner

The Listener not only tries to get 32 times at one time (to reduce the overhead for obtaining a limited number) but also does it in parallel

var results = await Task.WhenAll(gets).ConfigureAwait(false);

This is done to address high throughput scenario. It'd be possible to set parallelGets to 1 and then amend the batch size. If you find a way to address it and want to issue a PR @tmenier I can review it in a week or two.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants