Skip to content

cloud-controller-manager - CrashLoopBackOff #17347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RizwanaVyoma opened this issue Apr 8, 2025 · 3 comments
Open

cloud-controller-manager - CrashLoopBackOff #17347

RizwanaVyoma opened this issue Apr 8, 2025 · 3 comments

Comments

@RizwanaVyoma
Copy link

I am using Kops on GCE cluster.

The recent cluster update automatically changed cloud-controller-manager image

Change log:
ManagedFile/cluster.k8s.local-addons-gcp-cloud-controller.addons.k8s.io-k8s-1.23
        Contents
                                  name: KUBERNETES_SERVICE_HOST
                                            value: 127.0.0.1
                                +         image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a
                                -         image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:f575cc54d0ac3abf0c4c6e8306d6d809424e237e51f4a9f74575502be71c607c
                                          imagePullPolicy: IfNotPresent
                                          livenessProbe:

Because of this newly updated image gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a
cloud-controller-manager pod is crashed.

Log message in pod :
flag provided but not defined: -allocate-node-cidrs
Usage of /go-runner:
-also-stdout
useful with log-file, log to standard output as well as the log file
-log-file string
If non-empty, save stdout to this file
-redirect-stderr
treat stderr same as stdout (default true)

on checking the docker image
docker run --rm gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a --help

did not list any of the below attributes which are in the cloud-controller-manager deamonset

  • args:
    - --allocate-node-cidrs=true
    - --cidr-allocator-type=CloudAllocator
    - --cluster-cidr=************
    - --cluster-name=*************
    - --controllers=*
    - --leader-elect=true
    - --v=2
    - --cloud-provider=gce
    - --use-service-account-credentials=true
    - --cloud-config=/etc/kubernetes/cloud.config

Please let us know how to fix this issue. How to avoid this automatic version updates of images. This new image is breaking the cluster.

@nevdullcode
Copy link

nevdullcode commented Apr 8, 2025

I am encountering this issue too with the following initial output pointing to a problem with the cloud-controller-manager as reported by @RizwanaVyoma :

$ kops version
Client version: 1.31.0

$ kops validate cluster --wait 15m
...

Validation Failed
W0408 17:05:03.705924 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
I0408 17:05:14.154048 1052446 gce_cloud.go:307] Scanning zones: [us-east1-b us-east1-c us-east1-d]
INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-east1-b        ControlPlane    n1-standard-4   1       1       us-east1
nodes-us-east1-b                Node            n2-standard-8   2       2       us-east1

NODE STATUS
NAME    ROLE    READY

VALIDATION ERRORS
KIND    NAME                                                                                                                                    MESSAGE
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm        machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk                machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w                machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w" has not yet joined cluster
Pod     kube-system/cloud-controller-manager-pzzk9                                                                                              system-cluster-critical pod "cloud-controller-manager-pzzk9" is not ready (cloud-controller-manager)
Pod     kube-system/coredns-autoscaler-56467f9769-ltzwk                                                                                         system-cluster-critical pod "coredns-autoscaler-56467f9769-ltzwk" is pending
Pod     kube-system/coredns-db7b68989-59cw7                                                                                                     system-cluster-critical pod "coredns-db7b68989-59cw7" is pending

Validation Failed
W0408 17:05:14.944295 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
Error: validation failed: wait time exceeded during validation

@rifelpet
Copy link
Member

rifelpet commented Apr 9, 2025

Can you try setting this in the cluster spec and run kops update cluster --yes to see if that fixes the issue?

spec:
  cloudControllerManager:
    image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:v32.2.4

@nevdullcode
Copy link

@rifelpet Yes, that works. Cluster validation now completes successfully. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants