上一篇中简单介绍了下kube-batch,这篇来简单说一下kube-batch的安装及配置,虽然官方github写的非常清楚,一来做个汇总,二来有些地方需要特别说明一下
因为kubernetes很早就已支持在一个集群中存在多个调度器,因此可以直接将新调度器安装在集群中
这里以kube-batch的release-0.5版本为准,这也是官方目前为止最新的release tag.
1
   | git clone -b release-0.5 https://github.com/kubernetes-sigs/kube-batch
   | 
 
官网的文档中使用的是helm的安装方法,不过helm由于verion2跟version3有一些差别,因此不一定对所有场景都可成功,作者最开始的时候就遇到了安装不成功的情况,因为helm的安装文件中,使用了"helm.sh/hook": "crd-install",这个annotations对helm的版本有要求,所以如果安装不成功,可以直接使用yaml文件进行安装,步骤如下:
1 2 3 4 5 6
   |  cd config/crds kubectl apply -f . -n kube-system
  cd config/queue kubectl apply -f . -n kube-system
 
  | 
 
这里要注意的,由于是调度器,会监听很多资源的创建情况,权限自然也需要大一点,官方也给了个权限文件,不过很简单粗暴,直接是集群admin的权限,如果怕有安全问题的话,这里就需要对权限进行收敛, 这个就因人而异,这里还是以官方的权限来说明
1 2 3 4 5 6 7 8 9 10 11 12
   | apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata:   name: default-sa-admin subjects:   - kind: ServiceAccount     name: default     namespace: kube-system roleRef:   kind: ClusterRole   name: cluster-admin   apiGroup: rbac.authorization.k8s.io
   | 
 
这里是直接将kube-system ns中的default sa直接给了cluster-admin的权限,这样,所有在kube-system ns下的pod,如果没有额外指定sa的话,则都是集群管理员权限
kubectl apply -f . -n kube-system
然后是配置文件,官方也给了默认的配置文件,对于插件的使用,这个要看实际的情况,这里还是以官方的默认配置来说明
1 2 3 4 5 6 7 8 9 10
   | actions: "allocate, backfill" tiers: - plugins:   - name: priority   - name: gang - plugins:   - name: drf   - name: predicates   - name: proportion   - name: nodeorder
   | 
 
这个配置文件的信息量有点大,具体含义就不在这里展开,后篇再详说,
上面的配置是默认配置,如果没有指定配置文件的话,则默认使用上面的配置,当然kube-batch也支持指定配置文件
只需要在启动参数中指定配置文件的yaml即可,如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
   | apiVersion: v1 data:   kube-batch-conf.yaml: |-     actions: "allocate, backfill"     tiers:     - plugins:       - name: gang       - name: priority     - plugins:       - name: drf       - name: predicates       - name: proportion       - name: nodeorder kind: ConfigMap metadata:   name: kube-batch-config   namespace: kube-system
   | 
 
最后启动deployment即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
   | apiVersion: extensions/v1beta1 kind: Deployment metadata:   labels:     app: kube-batch   name: kube-batch   namespace: kube-system spec:   progressDeadlineSeconds: 600   replicas: 1   revisionHistoryLimit: 10   selector:     matchLabels:       app: kube-batch   strategy:     rollingUpdate:       maxSurge: 25%       maxUnavailable: 25%     type: RollingUpdate   template:     metadata:       labels:         app: kube-batch     spec:       containers:       - args:         - --logtostderr         - --v         - "3"         - --scheduler-conf=/etc/kube-batch/kube-batch-conf.yaml         image: kube-batch/kube-batch:v0.5         imagePullPolicy: IfNotPresent         name: kube-batch         ports:         - containerPort: 8080           name: http           protocol: TCP         resources:           limits:             cpu: "2"             memory: 2Gi           requests:             cpu: "2"             memory: 2Gi         securityContext:           capabilities: {}         terminationMessagePath: /dev/termination-log         terminationMessagePolicy: File         volumeMounts:         - mountPath: /etc/kube-batch/kube-batch-conf.yaml           name: config           subPath: kube-batch-conf.yaml       dnsPolicy: ClusterFirst       restartPolicy: Always       schedulerName: default-scheduler       securityContext: {}       terminationGracePeriodSeconds: 30       volumes:       - configMap:           defaultMode: 256           name: kube-batch-config           optional: false         name: config --- apiVersion: v1 kind: Service metadata:   labels:     app: kube-batch   name: kube-batch   namespace: kube-system spec:   clusterIP: None   ports:   - name: http     port: 8080     protocol: TCP     targetPort: 8080   selector:     app: kube-batch   sessionAffinity: None   type: ClusterIP
   | 
 
Kube-batch自带有一些prometheus的监控metrics,主要是调度时延,调度成功、调度失败数等
servicemonitor如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
   | apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata:   labels:     app: kube-batch   name: kube-batch   namespace: monitoring spec:   endpoints:   - honorLabels: true     interval: 30s     port: http   jobLabel: app   namespaceSelector:     matchNames:     - kube-system   selector:     matchLabels:       app: kube-batch
   | 
 
最后在kube-system ns下看到kube-batch成功运行,安装OK.
流行的tf也原生支持kube-batch.
参考文章: