Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Quick startup #2436

Merged
merged 5 commits into from
Dec 28, 2024
Merged

Add Quick startup #2436

merged 5 commits into from
Dec 28, 2024

Conversation

amnonh
Copy link
Collaborator

@amnonh amnonh commented Dec 25, 2024

This series adds an option to a quicker startup process; it does not wait for each container process to start.

This method of starting the containers will not stop if a container fails to start correctly.

When experimenting on my laptop without Prometheus data,
startup time reduced from 45s to 1.5s.

To use the quick-startup, use the --quick-startup command line flag with start-all.sh or set ``QUICK_STARTUP=1``` in the env file.

if [ ! "$QUICK_STARTUP" = "1" ]; then
until $(curl --output /dev/null -f --silent http://localhost:$ALERTMANAGER_PORT) || [ $TRIES -eq $RETRIES ]; do
((TRIES = TRIES + 1))
sleep 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably reduce this to 1 second and increase the retries count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, curl has all the built-in functionality for retrying. No real need for this loop.

Copy link
Contributor

@nyh nyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, so I "approve", but I do think you should rethink the new fast option as something good enough to be the default - not some "nice option you shouldn't use in practice" like you suggested it is. If that requires changing start-all.sh to check something after starting all the processes and not start and ignore what happened - then so be it.

start-all.sh could even make the waiting experience more pleasant by adding a few printouts. For example, imagine that Grafana really does take 30 seconds to become responsive and we can't make it any faster. Then start-all.sh could print "Starting Prometheus and Grafana" and then just 1.5 seconds later print "Waiting for servers to become responsive..." and 30 seconds later we'll get "Grafana is up".

@@ -104,6 +104,7 @@ Options:
--limit container,param - Allow to set a specific Docker parameter for a container, where container can be:
prometheus, grafana, alertmanager, loki, sidecar, grafanarender
--archive data-directory - Treat data directory as an archive. This disables Prometheus time-to-live (infinite retention), and would run a minimal mode
--quick-startup - If set, the script will not validate that each of the processes start correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A thought:
start-all could have two modes of operations

  1. This "--quick-startup" means start-all doesn't wait at all for anything to come up.
  2. The other option is to run each service with --quick-startup, but then wait for all of them together.
    The existing mode of operation, starting each background process in order and waiting for each one separately, should probably not exist at all. If you don't think option 1 should become the default, option 2 can. The current situation (wait in order) isn't useful at all. Or so I think.

@@ -216,6 +217,9 @@ for arg; do
--no-alertmanager)
SKIP_ALERTMANAGER=1
;;
--quick-startup)
QUICK_STARTUP=1
;;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something looks off in the indentation here, but maybe it's just github showing it wrongly?

@@ -681,7 +688,7 @@ if [ "$SKIP_ALERTMANAGER" = "1" ]; then
AM_ADDRESS="127.0.0.1:9093"
else
echo "Wait for alert manager container to start"
AM_ADDRESS=$(./start-alertmanager.sh $ALERTMANAGER_PORT $ALERT_MANAGER_DIR -D "$DOCKER_PARAM" $LIMITS $VOLUMES $PARAMS $ALERTMANAGER_COMMAND $BIND_ADDRESS_CONFIG $ALERT_MANAGER_RULE_CONFIG)
AM_ADDRESS=$(./start-alertmanager.sh $ALERTMANAGER_PORT "$QUICK_STARTUP_CMD" $ALERT_MANAGER_DIR -D "$DOCKER_PARAM" $LIMITS $VOLUMES $PARAMS $ALERTMANAGER_COMMAND $BIND_ADDRESS_CONFIG $ALERT_MANAGER_RULE_CONFIG)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you wanted the quotes around "$QUICK_STARTUP_CMD"? It means if this variable is empty (as it is by default), and empty string parameter is passed to the script. Which maybe doesn't affect it, but it's strange nonetheless.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not exist by default add the quote make sure that it will not fail the script

@nyh
Copy link
Contributor

nyh commented Dec 26, 2024

You forgot "Fixes #2433"

This patch adds the quick-startup option and reduce the time between
iterations when waiting for the alertmanager to start.
This patch adds the quick-startup option and reduce the time between
iterations when waiting for the loki to start.
This patch adds the quick-startup option and reduce the time between
iterations when waiting for the grafana to start.
This patch adds the quick-startup option and reduce the time between
iterations when waiting for the prometheus to start.
@amnonh amnonh merged commit 14df059 into scylladb:master Dec 28, 2024
1 check passed
@amnonh amnonh deleted the quick_starup branch December 28, 2024 21:29
@amnonh amnonh added this to the Monitoring 4.9 milestone Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants