Share via


Failed Scheduling error when deploying application in Azure kubernetes

Question

Wednesday, June 26, 2019 8:14 PM

I ran helm install to install an app that is in my Azure Kubernetes Registry. One of the pods got stuck in a pending state and when I describe it, I see the warning below;

Events:
  Type     Reason            Age                   From               Message
                                                   
  Warning  FailedScheduling  24s (x18 over 9m39s)  default-scheduler  0/2 nodes are available: 2 node(s) didn't match node selector.

In more detail, these are the events that took place:

<vasanth_venkatachalam@Azure:~/clouddrive/.cloudconsole$> kubectl get pods
NAME                                      READY   STATUS    RESTARTS   AGE
cautious-magpie-cmdb-mariadb-0            0/1     Pending   0          6m12s
cautious-magpie-cmdb-post-install-nvxvt   1/1     Running   0          6m12s
<vasanth_venkatachalam@Azure:~/clouddrive/.cloudconsole$>

kubectl describe pod cautious-magpie-cmdb-mariadb-0

Name:               cautious-magpie-cmdb-mariadb-0
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               <none>
Labels:             app=cautious-magpie-cmdb
                    cmdb-dbtype=mariadb
                    controller-revision-hash=cautious-magpie-cmdb-mariadb-6bc7df696d
                    csf-component=cmdb

                    csf-subcomponent=mariadb
                    heritage=Tiller
                    release=cautious-magpie
                    statefulset.kubernetes.io/pod-name=cautious-magpie-cmdb-mariadb-0
                    type=mariadb
Annotations:        <none>
Status:             Pending
IP:
Controlled By:      StatefulSet/cautious-magpie-cmdb-mariadb
Init Containers:
  mariadbinit-user-config:
    Image:      impactcontainerregistry.azurecr.io/cmdb/mariadb:4.8-2.1315
    Port:       <none>
    Host Port:  <none>
    Command:
      bash
      -c
      cp /import-cm/mysqld.site /import/
      cp /import-users/database_users.json /import/ 2>/dev/null | true
      sed -i -e '$a\ $(ls -d /import/* | grep -v db.d) | true

    Limits:
      cpu:     100m
      memory:  64Mi
    Requests:
      cpu:        100m
      memory:     64Mi
    Environment:  <none>
    Mounts:
      /import from import (rw)
      /import-cm from import-cm (rw)
      /import-users from import-users (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vvwbw (ro)
Containers:
  mariadb:
    Image:       impactcontainerregistry.azurecr.io/cmdb/mariadb:4.8-2.1315
    Ports:       3306/TCP, 4444/TCP, 4567/TCP, 4568/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP
    Limits:
      cpu:     1
      memory:  768Mi
    Requests:
      cpu:     250m
      memory:  256Mi
    Liveness:  exec [bash -c /usr/bin/mariadb_db --verify-access
] delay=300s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [bash -c /usr/bin/mariadb_db --verify-access

    Liveness:  exec [bash -c /usr/bin/mariadb_db --verify-access
] delay=300s timeout=5s period=10s #success=1 #failure=6
    Readiness:  exec [bash -c /usr/bin/mariadb_db --verify-access
] delay=10s timeout=1s period=15s #success=1 #failure=3
    Environment:
      CLUSTER_TYPE:        simplex
      CLUSTER_NAME:        cautious-magpie
      REQUIRE_USERS_JSON:  yes
    Mounts:
      /chart from cluster-cm (rw)
      /import from import (rw)
      /import/db.d from importdb (rw)
      /mariadb from datadir (rw)
      /mariadb/backup from backupdir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vvwbw (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-cautious-magpie-cmdb-mariadb-0
    ReadOnly:   false
  backupdir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  backupdir-cautious-magpie-cmdb-mariadb-0
    ReadOnly:   false
  cluster-cm:
    Type:      ConfigMap (a volume populated by a ConfigMap)

    Name:      cautious-magpie-cmdb-mariadb-cluster
    Optional:  false
  import:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
  import-cm:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cautious-magpie-cmdb-mariadb-config
    Optional:  false
  import-users:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  cautious-magpie-cmdb-mariadb-initialusers
    Optional:    true
  importdb:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      cautious-magpie-cmdb-mariadb-databases
    Optional:  false
  default-token-vvwbw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-vvwbw
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason            Age                From               Message
                                                
  Warning  FailedScheduling  35s (x18 over 7m)  default-scheduler  0/2 nodes are available: 2 node(s) didn't match node selector.

What does this "FailedScheduling 2node(s) didn't match node selector even mean and how can I get around that?

All replies (5)

Thursday, June 27, 2019 9:51 PM ✅Answered | 1 vote

Update. I got past the error. The problem was that the values.yaml file also had nodeAffinity settings, and the key value pairs it was specifying also had to be changed.

nodeAffinity:
    enabled: true
    key: beta.kubernetes.io/os
    value: Linux


Thursday, June 27, 2019 4:54 AM | 1 vote

Hi Vasanth,

Looks like your nodes are not in Ready state.

Execute "Kubectl get nodes" and check if the nodes are in ready state or not. 

If its not in Ready state, Then describe the node and check the events. Common reason would be the stopped nodes.

If its in ready state, Then we need to else where.

From the pods description, You dont have a node selector.

I see a state full set in the label. Are you using s state full set?


Thursday, June 27, 2019 2:55 PM

The nodes are in ready state when this happens:

<Cvasanth_venkatachalam@Azure:~/clouddrive/.cloudconsole$> kubectl get nodes
NAME                       STATUS   ROLES   AGE     VERSION
aks-agentpool-30689406-1   Ready    agent   8d      v1.13.5
aks-agentpool-30689406-2   Ready    agent   6d18h   v1.13.5
<vasanth_venkatachalam@Azure:~/clouddrive/.cloudconsole$>

Someone suggested I should do this:

The
nodeSelector is missing “beta.kubernetes.io/os: linux”. Please correct the
deployment by including this information. To correct the current pod, you can
run the following command:<o:p></o:p>

kubectl
patch pod <pod name> -p
'{"spec":{"template":{"spec":{"nodeSelector":{"beta.kubernetes.io/os":
"linux"}}}}}'<o:p></o:p>

I first tried patching the pod that was stuck in pending state by running the above command:

kubectl patch pod exacerbated-dachshund-cmdb-mariadb-0 -p '{"spec":{"template":{"spec":{"nodeSelector":

{"beta.kubernetes.io/os": "linux"}}}}}'
pod/exacerbated-dachshund-cmdb-mariadb-0 patched (no change)

I then restarted that pod with:

<vasanth_venkatachalam@Azure:~/clouddrive/.cloudconsole$> kubectl delete pod exacerbated-dachshund-cmdb-mariadb-0

pod "exacerbated-dachshund-cmdb-mariadb-0" deleted

But the pod stays stuck in pending state:

<vasanth_venkatachalam@Azure:~/clouddrive/.cloudconsole$> kubectl get pods
NAME                                            READY   STATUS      RESTARTS   AGE
exacerbated-dachshund-cmdb-mariadb-0            0/1     Pending     0          10s
exacerbated-dachshund-cmdb-post-install-nnzf5   0/1     Completed   0          18h

And I see the same error in the events log:

3m51s       Warning   FailedScheduling          Pod           0/2 nodes are available: 2 node(s) didn't match node selector.
58s         Warning   FailedScheduling          Pod           0/2 nodes are available: 2 node(s) didn't match node selector.
58s         Normal    SuccessfulCreate          StatefulSet   create Pod exacerbated-dachshund-cmdb-mariadb-0 in StatefulSet exacerbated-dachshund-cmdb-mariadb successful
0s    Warning   FailedScheduling   Pod   0/2 nodes are available: 2 node(s) didn't match node selector.
0s    Warning   FailedScheduling   Pod   0/2 nodes are available: 2 node(s) didn't match node selector.
0s    Warning   FailedScheduling   Pod   0/2 nodes are available: 2 node(s) didn't match node selector.
0s    Warning   FailedScheduling   Pod   0/2 nodes are available: 2 node(s) didn't match node selector.
0s    Warning   FailedScheduling   Pod   0/2 nodes are available: 2 node(s) didn't match node selector.

<o:p></o:p>


Thursday, June 27, 2019 3:02 PM

Also yes I see some stateful.yaml files under the templates/ directories.

I'm not sure where to make the permanent fix tis person is mentioning here:

The
nodeSelector is missing “beta.kubernetes.io/os: linux”. Please correct the
deployment by including this information.

Is this supposed to go in the helm charts?

In the values.yaml I see the lines:

## Speficies the type of anti-affinity for scheduling pods to nodes.
## If hard, pods cannot be scheduled together on nodes, if soft,
## best-effort to avoid sharing nodes will be done
nodeAntiAffinity: soft

  ## Node labels for pod assignment
  ## Ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
  nodeAffinity:
    enabled: false
    key: is_worker
    value: true

And then there are stateful set files under the "templates" directory containing these lines. Am I supposed to modify one of these lines for the permanent fix mentioned above?

     affinity:
      {{- if .Values.<nameofapp>.nodeAffinity.enabled }}
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: {{ .Values.<nameofapp>.nodeAffinity.key }}
                operator: In
                values:
                - {{ quote .Values.<nameoffapp>.nodeAffinity.value }}
      {{- end }}

           podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:

selector:
    matchLabels:
      {{- include "<name of app>.labels" . | indent 6 }}


Thursday, June 27, 2019 7:18 PM

I made this change to the values.yaml file but I still get the same error when I do a helm install:

I added this line:

nodeSelector: {beta.kubernetes.io/os:linux}

When I describe the pod (kubectl describe) I still see Node-Sectors: <none>, followed by the same error.

Node-Selectors:  <none>

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s

                 node.kubernetes.io/unreachable:NoExecute for 300s

Events:

  Type     Reason            Age                   From               Message

                                                   

  Warning  FailedScheduling  25s (x18 over 6m20s)  default-scheduler  0/2 nodes are available: 2 node(s) didn't match node selector.

To be clear the helm command I ran was:

helm install cmdb -f cmdb-version0.yaml

Where cmdb-version0.yaml is my values override file which I first got by running helm inspect values <folder> and then modifying the resulting file.