Skip to content

Commit

Permalink
bulk_update_mappings adr
Browse files Browse the repository at this point in the history
  • Loading branch information
Kenneth Kehl committed Jan 22, 2025
1 parent e6899ef commit 669f4e9
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 2 deletions.
8 changes: 6 additions & 2 deletions docs/adrs/0010-adr-celery-pool-support-best-practice.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Make best use of celery worker pools

Status: N/A
Date: N/A
Status: Accepted
Date: 7 January 2025

### Context
Our API application started with initial celery pool support of 'prefork' (the default) and concurrency of 4. We continuously encountered instability, which we initially attributed to a resource leak. As a result of this we added the configuration `worker-max-tasks-per-child=500` which is a best practice. When we ran a load test of 25000 simulated messages, however, we continued to see stability issues, amounting to a crash of the app after 4 hours requiring a restage. Based on running `cf app notify-api-production` and observing that `cpu entitlement` was off the charts at 10000% to 12000% for the works, and after doing some further reading, we came to the conclusion that perhaps `prefork` pool support is not the best type of pool support for the API application.
Expand All @@ -10,8 +10,12 @@ The problem with `prefork` is that each process has a tendency to hang onto the

### Decision

We decided to try to the 'threads' pool support with increased concurrency.

### Consequences

We saw an immediate decrease in CPU usage of about 70% with no adverse consequences.

### Author
@kenkehl

Expand Down
31 changes: 31 additions & 0 deletions docs/adrs/0011-adr-delivery-receipts-updates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Optimize processing of delivery receipts

Status: Accepted
Date: 22 January 2025

### Context
Our original effort to get delivery receipts for text messages was very object oriented and conformed to other patterns in the app. After an individual message was sent, we would kick off a new task on a delay, and this task would go search the cloudwatch logs for the given phone number.
On paper this looked good, but when one customer did a big send of 25k messages, we realized suddenly this was a bad idea. We overloaded the AWS api call and got massive throttling as a result. Although we ultimately did get most of the delivery receipts, it took hours and the logs were filled with errors.

In refactoring this, there were two possible approaches we considered:

1. Batch updates in the db (up to 1000 messages at a time). This involved running update queries with case statements and there is some theoretical limit on how large these statements can get and still be efficient.

2. bulk_update_mappings(). This would be a raw updating similar to COPY where we could do millions of rows at a time.

### Decision

We decided to try to use batch updates. Even though they don't theoretically scale to the same level as bulk_update_mappings(), our app has a potential problem with using bulk_update_mappings(). In order for it to work, we would need to know the "id" for each notification, which is the primary key into the notifications table. We do NOT know the "id" when we process the delivery receipts. We do know the "message_id", but in order to get the "id" we would either have to a select query, or we would have to maintain some mapping in redis, etc.

It is not clear, given the extra work necessary, that bulk_update_mappings() would be greatly superior to batch updates for our purposes. And batch updates currently allow us to scale at least 100x above where we are now.

### Consequences

Batch updates greatly cleaned up the logs (no more errors for throttling) and reduced CPU consumption. It was a very positive change.

### Author
@kenkehl

### Stakeholders
@ccostino
@stvnrlly

0 comments on commit 669f4e9

Please sign in to comment.