Skip to content

Commit

Permalink
Avoid random failures when deleting environment
Browse files Browse the repository at this point in the history
Limactl is racy, trying to access files in other clusters directories
and failing when files were deleted. Until this issue is fixed in lima,
ensure that only single vm can be deleted at the same time.

Example failure:

    % drenv delete envs/regional-dr.yaml
    2024-09-13 05:59:57,159 INFO    [rdr] Deleting environment
    2024-09-13 05:59:57,169 INFO    [dr1] Deleting lima cluster
    2024-09-13 05:59:57,169 INFO    [dr2] Deleting lima cluster
    2024-09-13 05:59:57,169 INFO    [hub] Deleting lima cluster
    2024-09-13 05:59:57,255 WARNING [dr2] no such process
    2024-09-13 05:59:57,265 WARNING [dr2] remove /Users/nsoffer/.lima/dr2/ssh.sock: no such file or directory
    2024-09-13 05:59:57,265 WARNING [hub] remove /Users/nsoffer/.lima/hub/ssh.sock: no such file or directory
    2024-09-13 05:59:57,297 ERROR   [dr1] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory
    2024-09-13 05:59:57,297 ERROR   [hub] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory
    2024-09-13 05:59:57,298 ERROR   Command failed
    Traceback (most recent call last):
      ...
    drenv.commands.Error: Command failed:
       command: ('limactl', '--log-format=json', 'delete', '--force', 'dr1')
       exitcode: 1
       error:

Note how delete command for "dr1" and "hub" are failing to read lima.yaml
of cluster "dr2":

    2024-09-13 05:59:57,297 ERROR   [dr1] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory
    2024-09-13 05:59:57,297 ERROR   [hub] open /Users/nsoffer/.lima/dr2/lima.yaml: no such file or directory

With the lock, we run single limactl process at a time, so it cannot
race with other clusters.

Signed-off-by: Nir Soffer <[email protected]>
  • Loading branch information
nirs committed Sep 18, 2024
1 parent 94ab16f commit c021197
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion test/drenv/providers/lima/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import os
import subprocess
import tempfile
import threading
import time
from functools import partial

Expand All @@ -33,6 +34,11 @@
"service_cluster_ip_range",
)

# limactl delete is racy, trying to access lima.yaml in other clusters and
# fails when the files are deleted by another limactl process. Until limactl is
# fixed, ensure only single concurent delete.
_delete_vm_lock = threading.Lock()

# Provider scope


Expand Down Expand Up @@ -118,7 +124,9 @@ def delete(profile):
start = time.monotonic()
logging.info("[%s] Deleting lima cluster", profile["name"])

_delete_vm(profile)
with _delete_vm_lock:
_delete_vm(profile)

_delete_additional_disks(profile)
_remove_kubeconfig(profile)

Expand Down

0 comments on commit c021197

Please sign in to comment.