cloud-controller-manager - CrashLoopBackOff #17347

RizwanaVyoma · 2025-04-08T11:34:53Z

I am using Kops on GCE cluster.

The recent cluster update automatically changed cloud-controller-manager image

Change log:
ManagedFile/cluster.k8s.local-addons-gcp-cloud-controller.addons.k8s.io-k8s-1.23
        Contents
                                  name: KUBERNETES_SERVICE_HOST
                                            value: 127.0.0.1
                                +         image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a
                                -         image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:f575cc54d0ac3abf0c4c6e8306d6d809424e237e51f4a9f74575502be71c607c
                                          imagePullPolicy: IfNotPresent
                                          livenessProbe:

Because of this newly updated image gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a
cloud-controller-manager pod is crashed.

Log message in pod :
flag provided but not defined: -allocate-node-cidrs
Usage of /go-runner:
-also-stdout
useful with log-file, log to standard output as well as the log file
-log-file string
If non-empty, save stdout to this file
-redirect-stderr
treat stderr same as stdout (default true)

on checking the docker image
docker run --rm gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:master@sha256:b3ac9d2d9cff8d736473ab0297c57dfb1924b50758e5cc75a80bacd9d6568f8a --help

did not list any of the below attributes which are in the cloud-controller-manager deamonset

args:
- --allocate-node-cidrs=true
- --cidr-allocator-type=CloudAllocator
- --cluster-cidr=************
- --cluster-name=*************
- --controllers=*
- --leader-elect=true
- --v=2
- --cloud-provider=gce
- --use-service-account-credentials=true
- --cloud-config=/etc/kubernetes/cloud.config

Please let us know how to fix this issue. How to avoid this automatic version updates of images. This new image is breaking the cluster.

The text was updated successfully, but these errors were encountered:

nevdullcode · 2025-04-08T17:27:52Z

I am encountering this issue too with the following initial output pointing to a problem with the cloud-controller-manager as reported by @RizwanaVyoma :

$ kops version
Client version: 1.31.0

$ kops validate cluster --wait 15m
...

Validation Failed
W0408 17:05:03.705924 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
I0408 17:05:14.154048 1052446 gce_cloud.go:307] Scanning zones: [us-east1-b us-east1-c us-east1-d]
INSTANCE GROUPS
NAME                            ROLE            MACHINETYPE     MIN     MAX     SUBNETS
control-plane-us-east1-b        ControlPlane    n1-standard-4   1       1       us-east1
nodes-us-east1-b                Node            n2-standard-8   2       2       us-east1

NODE STATUS
NAME    ROLE    READY

VALIDATION ERRORS
KIND    NAME                                                                                                                                    MESSAGE
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm        machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/control-plane-us-east1-b-mxhm" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk                machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-h9tk" has not yet joined cluster
Machine https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w                machine "https://www.googleapis.com/compute/v1/projects/myprojectname/zones/us-east1-b/instances/nodes-us-east1-b-xt6w" has not yet joined cluster
Pod     kube-system/cloud-controller-manager-pzzk9                                                                                              system-cluster-critical pod "cloud-controller-manager-pzzk9" is not ready (cloud-controller-manager)
Pod     kube-system/coredns-autoscaler-56467f9769-ltzwk                                                                                         system-cluster-critical pod "coredns-autoscaler-56467f9769-ltzwk" is pending
Pod     kube-system/coredns-db7b68989-59cw7                                                                                                     system-cluster-critical pod "coredns-db7b68989-59cw7" is pending

Validation Failed
W0408 17:05:14.944295 1052446 validate_cluster.go:230] (will retry): cluster not yet healthy
Error: validation failed: wait time exceeded during validation

rifelpet · 2025-04-09T00:03:28Z

Can you try setting this in the cluster spec and run kops update cluster --yes to see if that fixes the issue?

spec:
  cloudControllerManager:
    image: gcr.io/k8s-staging-cloud-provider-gcp/cloud-controller-manager:v32.2.4

nevdullcode · 2025-04-09T15:07:54Z

@rifelpet Yes, that works. Cluster validation now completes successfully. Thank you!

rifelpet mentioned this issue Apr 9, 2025

Pin GCP CCM image to v32.2.4 #17348

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud-controller-manager - CrashLoopBackOff #17347

cloud-controller-manager - CrashLoopBackOff #17347

RizwanaVyoma commented Apr 8, 2025

nevdullcode commented Apr 8, 2025 •

edited

Loading

rifelpet commented Apr 9, 2025

nevdullcode commented Apr 9, 2025

cloud-controller-manager - CrashLoopBackOff #17347

cloud-controller-manager - CrashLoopBackOff #17347

Comments

RizwanaVyoma commented Apr 8, 2025

nevdullcode commented Apr 8, 2025 • edited Loading

rifelpet commented Apr 9, 2025

nevdullcode commented Apr 9, 2025

nevdullcode commented Apr 8, 2025 •

edited

Loading