-
-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runner fleeting implementation #1185
Comments
Hi, last week I also try to set up fleet runner, but also stuck with the following error message:
Here is my terraform configuration: data "aws_availability_zones" "available" {
state = "available"
}
data "aws_security_group" "default" {
name = "default"
vpc_id = module.vpc.vpc_id
}
# VPC Flow logs are not needed here
# kics-scan ignore-line
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.13.0"
name = "vpc-${var.environment}"
cidr = "10.0.0.0/16"
azs = [data.aws_availability_zones.available.names[0]]
private_subnets = ["10.0.1.0/24"]
public_subnets = ["10.0.101.0/24"]
map_public_ip_on_launch = true
tags = {
Environment = var.environment
}
}
module "vpc_endpoints" {
source = "terraform-aws-modules/vpc/aws//modules/vpc-endpoints"
version = "5.13.0"
vpc_id = module.vpc.vpc_id
endpoints = {
s3 = {
service = "s3"
tags = { Name = "s3-vpc-endpoint" }
}
}
tags = {
Environment = var.environment
}
}
module "runner" {
source = "cattle-ops/gitlab-runner/aws"
environment = var.environment
vpc_id = module.vpc.vpc_id
subnet_id = element(module.vpc.public_subnets, 0)
runner_cloudwatch = {
enable = false
}
runner_instance = {
collect_autoscaling_metrics = ["GroupDesiredCapacity", "GroupInServiceCapacity"]
name = var.runner_name
type = "t3.small"
ssm_access = true
monitoring = true
private_address_only = false
}
runner_networking = {
allow_incoming_ping_security_group_ids = [data.aws_security_group.default.id]
}
runner_gitlab = {
url = var.gitlab_url
preregistered_runner_token_ssm_parameter_name = var.preregistered_runner_token_ssm_parameter_name
}
runner_worker = {
type = "docker-autoscaler"
ssm_access = true
}
runner_worker_docker_autoscaler = {
fleeting_plugin_version = "1.0.0"
}
runner_worker_docker_autoscaler_ami_owners = ["591542846629"]
runner_worker_docker_autoscaler_ami_filter = {
name = ["al2023-ami-ecs-hvm-2023.0.20240905-kernel-6.1-x86_64"]
}
runner_worker_docker_machine_instance = {
monitoring = true
private_address_only = false
subnet_ids = module.vpc.public_subnets
}
runner_worker_docker_autoscaler_instance = {
root_size = 16
monitoring = true
private_address_only = false
}
runner_worker_docker_autoscaler_asg = {
subnet_ids = module.vpc.public_subnets
types = ["m5.large", "m5.xlarge"]
enable_mixed_instances_policy = true
on_demand_base_capacity = 1
on_demand_percentage_above_base_capacity = 0
max_growth_rate = 6
}
runner_worker_docker_autoscaler_autoscaling_options = [
{
periods = ["* * * * *"]
timezone = var.timezone
idle_count = 0
idle_time = "0s"
scale_factor = 0
}, {
periods = ["* 8-17 * * mon-fri"]
timezone = var.timezone
idle_count = 0
idle_time = "1m"
scale_factor = 0
}
]
runner_worker_docker_options = {
privileged = true,
image = "docker:24.0.6",
volumes = ["/cache", "/certs/client", "/var/run/docker.sock:/var/run/docker.sock"]
}
tags = {
"tf-aws-gitlab-runner:example" = "runner-default"
"tf-aws-gitlab-runner:instancelifecycle" = "spot:yes"
}
} |
I had the same issue a few weeks ago. I discovered that AWS EC2 Instance Connect wasn't installed in the Amazon Linux 2023 ECS Amazon Machine Image. The fleeting implementation uses EC2 Instance Connect to make a temporary SSH public key available in the EC2 metadata service, which SSH should check against. Unfortunately, it doesn't work without EC2 Instance Connect installed and properly configured in the SSH daemon config. I managed to fix it with a custom start script to install EC2 Instance Connect.
I hope this helps, |
I usually recommend to use the pre-defines AMIs from |
Anyone has been able to solve this yet? EDIT: i was able to solve it by updating the Maximum capacity in the auto scaling group for the runners manually |
I'm encountering the same issue. For me also setting Maximum capacity in the autoscaling group manually resolved the issue. However, now I see 3-4 nodes just idling around for no apparent reason. Is there something I need to do to get rid of those? Ideally it should be 0 if no jobs are running. |
I'm having the same issue with the ASG having a max size of 0 by default. Manually adjusting this works to get things going. |
Also seeing this issue |
Hi team,
Describe the bug
I'm trying to implement Runner fleeting from the exemple https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-fleeting-plugin. But after the implementation the gitlab runner does appear in
Never contacted
To Reproduce
So I register a ssm Parameter Store where I stored my runner authentication token (called gitlab-runner-token)
Then I copy paste all file from https://github.com/cattle-ops/terraform-aws-gitlab-runner/tree/main/examples/runner-fleeting-plugin and juste add a default value for :
I must have missed a step, but I don't understand which one. I don't see anything in the cloud-init log. It looks like nothing has been initialized.
After the initialization I also try to add the run manually it works. But I still have weird logs in my gitlab-runner service :
gitlab-runner.service - GitLab Runner
Loaded: loaded (/etc/systemd/system/gitlab-runner.service; enabled; preset: disabled)
Drop-In: /etc/systemd/system/gitlab-runner.service.d
└─kill.conf
Active: active (running) since Mon 2024-09-16 18:34:50 UTC; 1h 18min ago
Main PID: 25762 (gitlab-runner)
Tasks: 17 (limit: 1059)
Memory: 60.9M
CPU: 7.855s
CGroup: /system.slice/gitlab-runner.service
├─25762 /usr/bin/gitlab-runner run --working-directory /home/gitlab-runner --config /etc/gitlab-runner/config.toml --service gitlab-runner --user gitlab-runner
└─25778 fleeting-plugin-aws
Sep 16 19:53:21 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:21.991Z [INFO] increasing instances: amount=3 group=aws/eu-west-3/runners-default-asg
Sep 16 19:53:22 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:22.195Z [ERROR] increase instances: group=aws/eu-west-3/runners-default-asg num_requested=3 num_successful=0 err="rpc error: code = Unknown desc = increase instances: operation error Aut>
Sep 16 19:53:27 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:27.062Z [INFO] increasing instances: amount=3 group=aws/eu-west-3/runners-default-asg
Sep 16 19:53:27 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:27.265Z [ERROR] increase instances: group=aws/eu-west-3/runners-default-asg num_requested=3 num_successful=0 err="rpc error: code = Unknown desc = increase instances: operation error Aut>
Sep 16 19:53:32 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:32.088Z [INFO] increasing instances: amount=3 group=aws/eu-west-3/runners-default-asg
Sep 16 19:53:32 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:32.209Z [ERROR] increase instances: group=aws/eu-west-3/runners-default-asg num_requested=3 num_successful=0 err="rpc error: code = Unknown desc = increase instances: operation error Aut>
Sep 16 19:53:37 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:37.038Z [INFO] increasing instances: amount=3 group=aws/eu-west-3/runners-default-asg
Sep 16 19:53:37 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:37.240Z [ERROR] increase instances: group=aws/eu-west-3/runners-default-asg num_requested=3 num_successful=0 err="rpc error: code = Unknown desc = increase instances: operation error Aut>
Sep 16 19:53:42 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:42.062Z [INFO] increasing instances: amount=3 group=aws/eu-west-3/runners-default-asg
Sep 16 19:53:42 ip-10-0-1-12.eu-west-3.compute.internal gitlab-runner[25762]: 2024-09-16T19:53:42.246Z [ERROR] increase instances: group=aws/eu-west-3/runners-default-asg num_requested=3 num_successful=0 err="rpc error: code = Unknown desc = increase instances: operation error Aut>
The text was updated successfully, but these errors were encountered: