Evaluate Batch Updates versus Bulk Updates #1511
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In order to do bulk updates we need the primary key for the notification table (notification.id), which we currently do not have.
Right now we are doing batch updates, which seem to work great. But the question is how much can they scale. If they can't scale, we have to switch to bulk updates which requires running a query to fetch all the notification_ids, or storing all the notification_ids in redis.
Add a debug statement to see the elapsed time of each batch update. Currently we are doing batch updates of 1000. If the time for the query is 200 ms, we can process 2500 records or so per second, which is 150k records a minute, which is nuts because right now we are doing far less than 1000 records a minute. Conversely, if the time for the query is 2 minutes we need to start switching to bulk updates immediately.
Seeing how long the 1000 record batches take gives us a feel for how worthwhile it would be to switch to bulk updates. It will also provide us data if we want to try batch sides of 2000, 5000, 10000 etc.
Security Considerations
N/A