解决k8s删除pod以后无限重启该pod的问题

首先,查看node节点的日志,路径在/var/log/message

复制复制复制复制复制复制复制复制复制复制

Jun  1 11:32:34 apm-slave03 dockerd-current: time="2018-06-01T11:32:34.830329738+08:00" level=error msg="Handler for GET /containers/b532d65bd2ff380035560a33e435414b66ccfbfbbf6f3c9d51cb2f0add57b2d2/json returned error: No such container: b532d65bd2ff380035560a33e435414b66ccfbfbbf6f3c9d51cb2f0add57b2d2"
Jun  1 11:32:44 apm-slave03 kubelet: I0601 11:32:44.160859   20744 docker_manager.go:2495] checking backoff for container "kubernetes-dashboard" in pod "kubernetes-dashboard-latest-4167338039-95kb9"
Jun  1 11:32:44 apm-slave03 kubelet: I0601 11:32:44.161188   20744 docker_manager.go:2509] Back-off 5m0s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-latest-4167338039-95kb9_kube-system(c2097a18-654a-11e8-8d29-005056bc2ad1)
Jun  1 11:32:44 apm-slave03 kubelet: E0601 11:32:44.161302   20744 pod_workers.go:184] Error syncing pod c2097a18-654a-11e8-8d29-005056bc2ad1, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-latest-4167338039-95kb9_kube-system(c2097a18-654a-11e8-8d29-005056bc2ad1)"

这里基本上看不出问题,继续查看pod

复制复制复制复制复制复制复制复制复制

kubectl get pods -n kube-system 
NAME                                           READY     STATUS              RESTARTS   AGE
kubernetes-dashboard-latest-4167338039-bbwp4   0/1       ContainerCreating   0          47s

删除

复制复制复制复制复制复制复制复制

kubectl delete pod kubernetes-dashboard-latest-4167338039-bbwp4 -n kube-system

接着查看

复制复制复制复制复制复制复制

kubectl get pods -n kube-system | grep -v Running                             
NAME                                           READY     STATUS    RESTARTS   AGE
kubernetes-dashboard-latest-4167338039-95kb9   0/1       Error     0          4s

居然有重启了一个,于是乎连续删了5次,依旧重启,这有点匪夷所思.
查看其中一个pod的描述信息:

复制复制复制复制复制复制

 kubectl describe pod kubernetes-dashboard-latest-4167338039-95kb9 -n kube-system                                             
Name:           kubernetes-dashboard-latest-4167338039-95kb9
Namespace:      kube-system
Node:           apm-slave03/10.10.202.159
Start Time:     Fri, 01 Jun 2018 11:20:39 +0800
Labels:         k8s-app=kubernetes-dashboard
                kubernetes.io/cluster-service=true
                pod-template-hash=4167338039
                version=latest
Status:         Running
IP:             10.0.48.2
Controllers:    ReplicaSet/kubernetes-dashboard-latest-4167338039
Containers:
  kubernetes-dashboard:
    Container ID:       docker://fe222c62c496d1348b9da4d17da474721d941279c7bd476596a0e041353ccd55
    Image:              registry.cn-hangzhou.aliyuncs.com/google-containers/kubernetes-dashboard-amd64:v1.4.2
    Image ID:           docker-pullable://registry.cn-hangzhou.aliyuncs.com/google-containers/kubernetes-dashboard-amd64@sha256:8c9cafe41e0846c589a28ee337270d4e97d486058c17982314354556492f2c69
    Port:               9090/TCP
    Args:
      --apiserver-host=http://apm-slave02:8080
    Limits:
      cpu:      100m
      memory:   50Mi
    Requests:
      cpu:                      100m
      memory:                   50Mi
    State:                      Terminated
      Reason:                   Error
      Exit Code:                1
      Started:                  Fri, 01 Jun 2018 11:21:04 +0800
      Finished:                 Fri, 01 Jun 2018 11:21:06 +0800
    Last State:                 Terminated
      Reason:                   Error
      Exit Code:                1
      Started:                  Fri, 01 Jun 2018 11:20:43 +0800
      Finished:                 Fri, 01 Jun 2018 11:20:45 +0800
    Ready:                      False
▽   Restart Count:              2
    Liveness:                   http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Volume Mounts:              
    Environment Variables:      
Conditions:
  Type          Status
  Initialized   True 
  Ready         False 
  PodScheduled  True 
No volumes.
QoS Class:      Guaranteed
Tolerations:    
Events:
  FirstSeen     LastSeen        Count   From                    SubObjectPath                           Type            Reason          Message
  ---------     --------        -----   ----                    -------------                           --------        ------          -------
  28s           28s             1       {default-scheduler }                                            Normal          Scheduled       Successfully assigned kubernetes-dashboard-latest-4167338039-95kb9 to apm-slave03
  28s           28s             1       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Normal          Created         Created container with docker id f880c9e76de7; Security:[seccomp=unconfined]
  27s           27s             1       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Normal          Started         Started container with docker id f880c9e76de7
  24s           24s             1       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Normal          Created         Created container with docker id 16f258977612; Security:[seccomp=unconfined]
  24s           24s             1       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Normal          Started         Started container with docker id 16f258977612
  22s           18s             2       {kubelet apm-slave03}                                           Warning         FailedSync      Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 10s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-latest-4167338039-95kb9_kube-system(c2097a18-654a-11e8-8d29-005056bc2ad1)"

  28s   3s      4       {kubelet apm-slave03}                                           Warning MissingClusterDNS       kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.
  28s   3s      3       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Normal  Pulled                  Container image "registry.cn-hangzhou.aliyuncs.com/google-containers/kubernetes-dashboard-amd64:v1.4.2" already present on machine
  3s    3s      1       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Normal  Created                 Created container with docker id fe222c62c496; Security:[seccomp=unconfined]
  3s    3s      1       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Normal  Started                 Started container with docker id fe222c62c496
  22s   0s      3       {kubelet apm-slave03}   spec.containers{kubernetes-dashboard}   Warning BackOff                 Back-off restarting failed docker container
  0s    0s      1       {kubelet apm-slave03}                                           Warning FailedSync              Error syncing pod, skipping: failed to "StartContainer" for "kubernetes-dashboard" with CrashLoopBackOff: "Back-off 20s restarting failed container=kubernetes-dashboard pod=kubernetes-dashboard-latest-4167338039-95kb9_kube-system(c2097a18-654a-11e8-8d29-005056bc2ad1)"

似乎也没看到想要的东西,于是面向Google了一把,最终结论是需要删除deployments才能完全删除那个pod,于是先看下长啥样~

复制复制复制复制复制

kubectl get deployments --all-namespaces
NAMESPACE     NAME                          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   kubernetes-dashboard-latest   1         1         1            0           2d

发现Name居然是我第一次创建的deployment,并不是我现在创建的,于是(咬牙切齿)删除之

复制复制复制复制

kubectl delete deployments kubernetes-dashboard-latest -n kube-system
deployment "kubernetes-dashboard-latest" deleted

kubectl get deployments --all-namespaces
No resources found.

kubectl get pods -n kube-system | grep -v Running                              
No resources found.

无限重启pod的问题总算解决!

重新创建deployments

复制复制复制

kubectl get deployments --all-namespaces                             
NAMESPACE     NAME                   DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
kube-system   kubernetes-dashboard   1         1         1            1           5s

你可能感兴趣的