Kubernetes学习(Kubernetes踩坑记)

记录在使用Kubernetes中遇到的各种问题及解决方案, 好记性不如烂笔头

不定期更新

kubelet日志错误: Unable to create endpoint (status 429)

1	Unable to create endpoint: response status code does not match any response statuses defined for this endpoint in the swagger spec (status 429)

原因: 并发创建的pod数太多触发了cilium的api-rate-limit配置上限, 更详细的说明可参考以下链接
解决: 参考cilium在kubernetes中的生产实践六(cilium排错指南)之api-rate-limit

kubelet中提示: no relationship found between node xxx and this object

1	User "system:node:xxx" cannot list resource "configmap" in API group in the namespace "xxx", no relationship found between node xxx and this object

原因: 由于集群开启了Node Authorization Mode,所以节点上的kubelet能操作的权限是跟节点上运行pod相关联的，可以通过以下的方式验证

1
2
3

# 登录到节点kubectl
kubectl --kubeconfig=/etc/kubernetes/kubelet.conf -n xxx can-i list cm/xxx
# 如果有权限则提示yes, 没有则为no

解决: 如果需要访问，则可通过一个pod挂载对应的cm调度到该节点上，那么在这个node上即可通过kubelet访问到相关的cm

参考:

kube-controller-manager日志出现: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1

原因: apiservice存在Failed的service

解决:

1
2
3

kubectl get apiservice
# 查看是否存在ServiceNotFound, 删除对应的service即可
kubectl delete apiservice {name}

kubelet启动时提示Failed to start ContainerManager failed to build map of initial containers from runtime: no PodsandBox found with Id

原因: kubelet的数据目录存在脏容器数据

解决: 使用以下命令找到脏容器，删除后重启kubelet

# 894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028 是上面报错中的ID
docker ps -a --filter "label=io.kubernetes.sandbox.id=894f35dca3eda57adef28b69acd0607efdeb34e8814e87e196bc163305576028"
# 根据上述ID删除容器

docker rm ID

# 重启kubelet
systemctl restart kubelet

prometheus logs: compaction failed, 对数的性质uption in segment xxxx at yyyy: unexpected non-zero byte in padded page

原因: prometheus在对wal进行压缩时出现segment错误，导致创建checkout失败

解决: 在prometheus持久化目录下删除上述xxxx的目录，然后重启prometheus

重启后可能会出现unexpected gap to last checkpoint, expect: xxx, requested: yyy

需要将checkout目录也进行删除，然后重启prometheus实例

is forbidden: User xxx cannot get resource “services/proxy”

在rancher中使用非admin用户无法显示grafana的 workload metrics, 请求中提示: services http:rancher-monitoring-grafana:80 is forbidden: User xxx cannot get resource “services/proxy” in API group “” in the namespace cattle-monitoring-system

原因: 需要为该用户在cattle-monitoring-system ns中授权 services/proxy权限, 同时该用户所对应的角色需要继承 project-monitoring-view角色，这样非admin用户才能看到metrics菜单

参考: https://forums.rancher.com/t/cluster-member-cant-see-use-grafana-or-monitoring-stuff/15814

pod状态提示UnexpectedAdmissionError

原因: 在排查的过程中，发现这个问题涉及的东西太多，写了篇专门的文章来说明这个问题，可参考Kubernetes-pod-status-is-UnexpectedAdmissionError

nvidia-device-plugin 提示bind: address already in use

原因: 这个错误提示其实是有歧义的, 一般看到bind: address already in use都会认为是不是地址端口被占用了, 在这里其实不是，正常来讲nvidia-device-plugin在正常退出后会将节点上的nvidia.sock文件一起删除，启动时会自动创建该文件, 但如果出现退出后nvidia.sock文件还存在，这个时候启动nvidia-device-plugin就会提示上述报错

解决: 手动删除节点上的nvidia.sock,然后重启nvidia-device-plugin即可

prometheus提示 /metrics/resource/v1alpha1 404

原因: 这是因为[/metrics/resource/v1alpha1]是在v1.14中才新增的特性，而当前kubelet版本为1.13

解决: 升级k8s的版本，这里要注意的是kubelet的版本不能为api-server的高，所以不能只升级kubelet.

Error from server (Forbidden): pods “xxx” is forbidden: cannot exec into or attach to a privileged container

原因: 排查两个方面，是否有psp，第二个是否启用了相关的admission

解决: 在本case中，因安全因素，开启了DenyEscalatingExec 这个admission，从api-server的配置–enable-admission-plugins中上去掉DenyEscalatingExec 即可

参考: https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/

kubeadm join提示unable to fetch the kubeadm-config ConfigMap

[discovery] Successfully established connection with API Server "xxx.xxx.xxx.xxx:16443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://127.0.0.1:16443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 127.0.0.1:16443: connect: connection refused

原因: 127.0.0.1:16443是apiserver的VIP,从报错信息来看, 对127.0.0.1:16443的访问被拒绝了, 但是在apiserver本地curl这个地址又是没问题的，还是非常诡异，可以通过以下方式解决了

解决: 请确认好kubeadm join时会访问的两个配置文件中的apiserver地址是否正确

kubectl -n kube-system get cm kubeadm-config -oyaml
# 其中的controlPlaneEndpoint地址

kubectl edit cm cluster-info -oyaml -n kube-public
# 其中的server地址

参考: https://github.com/kubernetes/kubeadm/issues/1596

CRD spec.versions: Invalid value

原因: CRD yaml文件中apiVersion与versions中的版本不对应

参考: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/

删除namespaces时Terminating，无法强制删除且无法在该ns下创建对象

原因: ns处于terminating时hang住了，使用--grace-period=0 -- force强制删除也无效

解决:

# 保存现在的ns json
kubectl get ns xxxx -o json > /tmp/temp.json
# 编辑temp.json，将其中的spec.finalizer字段删除保存
# 导出k8s访问密钥
echo $(kubectl config view --raw -oyaml | grep client-cert  |cut -d ' ' -f 6) |base64 -d > /tmp/client.pem
echo $(kubectl config view --raw -oyaml | grep client-key-data  |cut -d ' ' -f 6 ) |base64 -d > /tmp/client-key.pem
echo $(kubectl config view --raw -oyaml | grep certificate-authority-data  |cut -d ' ' -f 6  ) |base64 -d > /tmp/ca.pem
# 解决namespace Terminating，根据实际情况修改<namespaces>
curl --cert /tmp/client.pem --key /tmp/client-key.pem --cacert /tmp/ca.pem -H "Content-Type: application/json" -X PUT --data-binary @/tmp/temp.json https://xxx.xxx.xxx.xxx:6443/api/v1/namespaces/<namespaces>/finalize

docker 启动时提示no sockets found via socket activation

解决: 在start docker前先执行systemctl unmask docker.socket即可

Prometheus opening storage failed: invalid block sequence

原因: 这个需要排查prometheus持久化目录中是否存在时间超出设置阈值的时间段的文件，删掉后重启即可

kubelet提示: The node was low on resource: ephemeral-storage

原因: 节点上kubelet的配置路径超过阈值会触发驱逐，默认情况下阈值是85%

解决: 或者清理磁盘释放资源，或者通过可修改kubelet的配置参数imagefs.available来提高阈值,然后重启kubelet.

参考: https://cloud.tencent.com/developer/article/1456389

kubectl查看日志时提示: Error from server: Get https://xxx:10250/containerLogs/spring-prod/xxx-0/xxx: dial tcp xxx:10250: i/o timeout

原因: 目地机器的iptables对10250这个端口进行了drop，如下图

1	iptables-save -L INPUT –-line-numbers

解决: 删除对应的规则

1	iptables -D INPUT 10

Service解析提示 Temporary failure in name resolution

原因: 出现这种情况很奇怪，现象显示就是域名无法解析，全格式的域名能够解析是因为在pod的/etc/hosts中有全域名的记录,那么问题就出在于corddns解析上，coredns从日志来看，没有任何报错，但是从pod的状态来看，虽然处于Running状态，但是0/1可以看出coredns并未处于ready状态.

可以查看ep记录，会发现endpoint那一栏是空的，这也就证实了k8s把coredns的状态分为了notready状态，所以ep才没有记录，经过与其它环境比较后发现跟配置有关，最终定位在coredns的配置文件上,在插件上需要加上ready

解决: 在cm的配置上添加read插件，如下图

# ... 省略
data:
  Corefile: |
    .:53 {
        errors
        health
        ready  # 加上该行后问题解决
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          upstream /etc/resolv.conf
          fallthrough in-addr.arpa ip6.arpa
        }
       # ... 省略

关于coredns的ready插件的使用,可以参考这里

总结起来就是使用ready来表明当前已准备好可以接收请求，从codedns的yaml文件也可以看到有livenessProbe

使用Kubectl命令行时提示: Unable to connect to the server: x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0

原因: 这个跟本地的go环境有关

解决: 在使用kubectl前使用命令export GODEBUG=x509ignoreCN=0即可

namespaces "kube-system" is forbidden: this namespace may not be deleted

原因: kube-system是集群中受保护的ns, 被禁止删除，主要是防止误操作，如果需要删除的话，可以使用–force

参考: https://github.com/kubernetes/kubernetes/pull/62167/files

unknown field volumeClaimTemplates

原因: 提示这个错误的原因是资源对象是deployment, 而deployment本就是无状态的，所以也就没有使用pv这一说法了，可以参考api

参考: deploymentspec-v1-apps

CoreDNS提示Loop (127.0.0.1:38827 -> :53) detected for zone “.”

原因: CoreDNS所在的宿主机上/etc/resolv.conf中存在有127.0.xx的nameserver,这样会造成解析死循环.

解决: 修改宿主机/etc/resolv.conf或者将CoreDNS的configmap中的forward修改为一个可用的地址, 如8.8.8.8

hostPath volumes are not allowed to be used

原因: 集群中存在psp禁止pod直接挂载hostpath.

解决: 通过添加以下的psp规则来允许或者删除存在的psp都可

apiVersion: extensions/v1beta1
kind: PodSecurityPolicy
metadata:
  name: auth-privilege-psp
spec:
  allowPrivilegeEscalation: true
  allowedHostPaths:
  - pathPrefix: /
  fsGroup:
    ranges:
    - max: 65535
      min: 1
    rule: RunAsAny
  hostNetwork: true
  hostPID: true
  hostPorts:
  - max: 9796
    min: 9796
  privileged: true
  requiredDropCapabilities:
  - ALL
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    ranges:
    - max: 65535
      min: 1
    rule: RunAsAny
  volumes:
  - configMap
  - emptyDir
  - projected
  - secret
  - downwardAPI
  - persistentVolumeClaim
  - hostPath

container has runAsNonRoot and image has non-numeric user (grafana), cannot verify user is non-root

原因: 这是由于在deploy中设置了securityContext: runAsNonRoot: true, 在这种情况下，当pod启动时，使用的默认用户,比如上面的grafana，k8s无法确定他是不是root用户

解决: 指定securityContext:runAsUser: 1000, 随便一个id号即可, 只要不是0(0代表root)

参考: https://stackoverflow.com/questions/51544003/using-runasnonroot-in-kubernetes

OCI runtime create failed: no such file or directory

原因: /var/lib/kubelet/pod下的数据目录已经损坏.

解决: 删除对应的目录即可

镜像拉取时出现ImageInspectError

原因: 这种情况下一般都是镜像损坏了

解决: 把相关的镜像删除后重新拉取

kubelet日志提示: node not found

原因: 这个报错只是中间过程，真正的原因在于apiserver没有启动成功，导致会一直出现这个错误

解决: 排查kubelet与apiserver的连通是否正常

OCI runtime create failed: executable file not found in PATH

原因: 在path中没有nvidia-container-runtime-hook这个二进制文件，可能跟本人删除nvidia显卡驱动有关.

解决: nvidia-container-runtime-hook是docker nvidia的runtime文件，重新安装即可.

Nginx Ingress Empty address

1
2
3

# kubectl get ingress
NAME         HOSTS                                       ADDRESS   PORTS   AGE
prometheus   prometheus.1box.com                                   80      31d

会发现address中的ip是空的，而查看生产环境时却是有ip列表的.

原因: 这个其实不是一个错误，也不影响使用，原因在于测试环境中是不存在LoadBalance类型的svc, 如果需要address中显示ip的话需要做些额外的设置

解决:

在nginx controller的容器中指定启动参数-report-ingress-status
在nginx controller引用的configmap中添加external-status-address: "10.164.15.220"

这样的话,在address中变会显示10.164.15.220了

参考:

https://github.com/nginxinc/kubernetes-ingress/issues/587

https://docs.nginx.com/nginx-ingress-controller/configuration/global-configuration/reporting-resources-status/

kubelet: but volume paths are still present on disk

原因: 这种pod已经被删除了，但是volume还存在于disk中

解决: 删除对应的目录/var/lib/kubelet/pods/3cd73...

参考: https://github.com/longhorn/longhorn/issues/485

PLEG is not healthy

原因: 宿主机上面跑的容器太多，导致pod无法在3m钟内完成生命周期检查

解决: PLEG(Pod Lifecycle Event Generator)用于kublet同步pod生命周期，本想着如果是因为时间短导致的超时，那是不是可以直接调整这个时间呢? 查看kubelet的源码发现不太行，3m时间是写在代码里的因此无法修改，当然修改再编译肯定没问题，但成本太大，所以只得优化容器的调度情况.

参考: https://developers.redhat.com/blog/2019/11/13/pod-lifecycle-event-generator-understanding-the-pleg-is-not-healthy-issue-in-kubernetes/

metrics-server: 10255 connection refused

unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:k8s-node-49: unable to fetch metrics from Kubelet k8s-node-49 (xxx.xxx.xxx.49): Get http://xxx.xxx.xxx.49:10255/stats/summary?only_cpu_and_memory=true: dial tcp xxx.xxx.xxx.49:10255: connect: connection refused

原因: 现在的k8s都默认禁用了kubelet的10255端口，出现这个错误是因此在kubelet启动命令中启用了该端口

解决: 将- --kubelet-port=10255注释

metrics-server: no such host

1	unable to fetch metrics from Kubelet k8s-node-234 (k8s-node-234): Get https://k8s-node-234:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup k8s-node-234 on 10.96.0.10:53: no such host

解决: 使用kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP参数

参考: https://github.com/kubernetes-sigs/metrics-server/blob/master/README.md

pod无法解析域名

集群中新增了几台机器用于部署clickhouse用于做大数据分析，为了不让这类占用大量资源的Pod影响其它Pod，因此选择给机器打taint的形式控制该类Pod的调度, 创建Pod后发现这些Pod都会出现DNS解析异常,

原因；要注意容器网络，比如这里使用的是flannel是否容忍了这些机器的taint，不然的话，flannel是无法被调度到这些机器的,因此容器间的通信会出现问题，可以将类似flannel这些的基础POD容忍所有的NoScheule与NoExecute

解决: flannel的ds yaml中添加以下toleration，这样适用任何的场景

tolerations:
- effect: NoSchedule
  operator: Exists
- effect: NoExecute
  operator: Exists

Are you tring to mount a directory on to a file

原因: Yaml文件中使用了subPath, 但是mountPath指向了一个目录

解决: mountPath需要加上文件名

Kubernetes启动后提示slice: no such file ro directory

原因: yum安装的kubelet默认的是cgroupfs，而docker一般默认的是systemd。但是kubernetes安装的时候建议使用systemd, kubelet跟docker的不一致, 要么修改kubelet的启动参数 , 要么修改dokcer启动参数

解决:

docker的启动参数文件为: /etc/docker/daemon.json: "exec-opts": ["native.cgroupdriver=systemd”]

kubelet的启动参数文件为: /var/lib/kubelet/config.yaml: cgroupDriver: systemd

“cni0” already has an IP address different from xxx.xxxx.xxx.xxx

原因: 使用kubeadm reset 重复操作过, reset之后，之前flannel创建的bridge device cni0和网口设备flannel.1依然健在

解决: 添加之前需要清除下网络

kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker
systemctl start kubelet

kubeadm初始化时提示 CPU小于2

[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR NumCPU]: the number of available CPUs 1 is less than the required 2
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

原因: kubeadm对资源一定的要求，如果是测试环境无所谓的话,可忽略

解决:

1	使用 --ignore-preflight-errors 忽略

Unable to update cni config: no network found

原因: 还未部署网络插件容器，导致在/etc/cni下还没有文件

解决: 根据实际情况部署网络插件

while reading ‘google-dockercfg’ metadata

原因: 从其它机器访问上述这些url确实出现 404

解决: 由于是在RKE上部署k8s, 所以可能会去访问google相关的url, 不影响业务,可以忽略

no providers available to validate pod request

原因: 在api-server的启动参数enable-admission中设置了PodSecrityPolicy, 但是集群中又没有任何的podsecritypolicy，因此导致整个集群都无法新建出pod

解决: 删除相应的podsecritypolicy即可

unable to upgrade connection: Unauthorized

原因: kubelet的启动参数少了x509认证方式

解决: 配置证书的路径, 加上重启kubelet即可

kubectl get cs 提示<unknown>

原因: 这是个kubectl的bug, 跟版本相关，kubernetes有意废除get cs命令

解决: 目前对集群的运行无影响, 可通过加-oyaml 查看状态

安装kubeadm时提示Depends错误

原因: 跟kubeadm没多大关系, 系统安装的有问题

解决: 执行以下命令修复

1 2	apt --fix-broken install apt-get update

访问service时提示Connection refused

现象: 从另一环境中把yaml文件导入到新环境后有些service访问不通

1
2
3

telnet mongodb-mst.external 27017
Trying 10.97.135.242...
telnet: Unable to connect to remote host: Connection refused

首先排除了域名、端口的配置问题。

会发现提示连接拒绝.可以确定的是集群内的DNS是正常的.

那么就是通过clusterIP无法到达realserver. 查看iptables规则

发现提示default has no endpoints --reject-with icmp-port-unreachable

很奇怪, 提示没有endpoints, 但是使用kubectl get ep又能看到ep存在且配置没有问题

而且这个default是怎么来的.

为了方便部署, 很多配置是从别的环境导出的配置, 有些service访问是没问题的, 只有少部分connection refused

结比一下发现一个很有趣的问题，先来看下不正常的yaml文件:

由于服务在集群外部署的, 因此这里使用了subset方式, 开始怀疑问题在这里, 但是后来知道这个不是重点

乍一看这个配置没什么问题, 部署也很正常, 但是对比正常的yaml文件，发现一个区别：

如果在services中的端口指定了名字, 那么在subsets中的端口也要带名字, 没有带名字的就会出现connection refused，这个确实之前从来没有关注过, 一个端口的情况下也不会指定名字

而且这面iptalbes中提示的default刚好就是这里的port name,虽然不敢相信，但是也只能试一试这个方法: 在subsets中也加了port name

重新部署一个，再次查看iptalbes规则

iptables-save|grep mongodb-mst

OMG, 居然可行, 再看下telnet的结果:

1
2
3

Trying 10.105.116.92...
Connected to mongodb-mst.external.svc.cluster.local.
Escape character is '^]'.

访问也是没问题, 那么原因就在于:

在service中指定了port name时, 也需要在ep中指定port name

error converting fieldPath: field label not supported

今天遇到一个部署deployment出错的问题, yaml文件如下:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: demo-deployment
  namespace: 4test
  labels:
    app: config-demo-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: config-demo-app
  template:
    metadata:
      labels:
        app: config-demo-app
      annotations:
        # The field we'll use to couple our ConfigMap and Deployment
        configHash: 4431f6d28fdf60c8140d28c42cde331a76269ac7a0e6af01d0de0fa8392c1145
    spec:
      containers:
      - name: config-demo-app
        image: gcr.io/optimum-rock-145719/config-demo-app
        ports:
        - containerPort: 80
        envFrom:
        # The ConfigMap we want to use
        - configMapRef:
            name: demo-config
        # Extra-curricular: We can make the hash of our ConfigMap available at a
        # (e.g.) debug endpoint via a fieldRef
        env:
        - name: CONFIG_HASH
          #value: "4431f6d28fdf60c8140d28c42cde331a76269ac7a0e6af01d0de0fa8392c1145"
          valueFrom:
            fieldRef:
              fieldPath: spec.template.metadata.annotations.configHash

提示以下错误:

会提示Unsupported value:spec.template.metadata.annotations.configHash

目的很简单: container中的环境变量中引用configHash变量, 这个值是当configmap变更时比对两个不同的sha值以此达到重启pod的目的, 但fieldPath显然不支持spec.template.metadata.annotations.configHash

从报错提示来看, 支持列表有metadata.name, metadata.namespace, metadata.uid, spec.nodeName,spec.serviceAccountName, status.hostIp, status.PodIP, status.PodIPs

这些值用于容器中需要以下信息时可以不从k8s的apiserver中获取而是可以很方便地从这些变量直接获得

参考:

https://www.magalix.com/blog/kubernetes-patterns-the-reflection-pattern

https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/

参考文章:

cilium在kubernetes中的生产实践六(cilium排错指南)之api-rate-limit

Kubernetes : Le Node Authorization Mode de l'API-Server

https://www.ibm.com/docs/en/cloud-private/3.2.0?topic=console-namespace-is-stuck-in-terminating-state

https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definition-versioning/

https://github.com/kubernetes/kubernetes/issues/19317

http://www.xuyasong.com/?p=1725

https://kubernetes.io/

https://fuckcloudnative.io/

https://www.cnblogs.com/breezey/p/8810039.html

https://ieevee.com/tech/2018/04/25/downwardapi.html

https://www.magalix.com/blog/kubernetes-patterns-the-reflection-pattern

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.19/#deploymentspec-v1-apps

https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/

https://github.com/kubernetes/kubernetes/pull/62167/files

https://github.com/kubernetes-sigs/metrics-server/blob/master/README.md

https://github.com/kubernetes/kubeadm/issues/1596

https://izsk.me/2022/01/27/Kubernetes-pod-status-is-UnexpectedAdmissionError