上一篇中简单介绍了下kube-batch,这篇来简单说一下kube-batch的安装及配置,虽然官方github写的非常清楚,一来做个汇总,二来有些地方需要特别说明一下
因为kubernetes很早就已支持在一个集群中存在多个调度器,因此可以直接将新调度器安装在集群中
这里以kube-batch的release-0.5版本为准,这也是官方目前为止最新的release tag.
1
| git clone -b release-0.5 https://github.com/kubernetes-sigs/kube-batch
|
官网的文档中使用的是helm的安装方法,不过helm由于verion2跟version3有一些差别,因此不一定对所有场景都可成功,作者最开始的时候就遇到了安装不成功的情况,因为helm的安装文件中,使用了"helm.sh/hook": "crd-install"
,这个annotations对helm的版本有要求,所以如果安装不成功,可以直接使用yaml文件进行安装,步骤如下:
1 2 3 4 5 6
| cd config/crds kubectl apply -f . -n kube-system
cd config/queue kubectl apply -f . -n kube-system
|
这里要注意的,由于是调度器,会监听很多资源的创建情况,权限自然也需要大一点,官方也给了个权限文件,不过很简单粗暴,直接是集群admin的权限,如果怕有安全问题的话,这里就需要对权限进行收敛, 这个就因人而异,这里还是以官方的权限来说明
1 2 3 4 5 6 7 8 9 10 11 12
| apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: default-sa-admin subjects: - kind: ServiceAccount name: default namespace: kube-system roleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.io
|
这里是直接将kube-system ns中的default sa直接给了cluster-admin的权限,这样,所有在kube-system ns下的pod,如果没有额外指定sa的话,则都是集群管理员权限
kubectl apply -f . -n kube-system
然后是配置文件,官方也给了默认的配置文件,对于插件的使用,这个要看实际的情况,这里还是以官方的默认配置来说明
1 2 3 4 5 6 7 8 9 10
| actions: "allocate, backfill" tiers: - plugins: - name: priority - name: gang - plugins: - name: drf - name: predicates - name: proportion - name: nodeorder
|
这个配置文件的信息量有点大,具体含义就不在这里展开,后篇再详说,
上面的配置是默认配置,如果没有指定配置文件的话,则默认使用上面的配置,当然kube-batch也支持指定配置文件
只需要在启动参数中指定配置文件的yaml即可,如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| apiVersion: v1 data: kube-batch-conf.yaml: |- actions: "allocate, backfill" tiers: - plugins: - name: gang - name: priority - plugins: - name: drf - name: predicates - name: proportion - name: nodeorder kind: ConfigMap metadata: name: kube-batch-config namespace: kube-system
|
最后启动deployment即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
| apiVersion: extensions/v1beta1 kind: Deployment metadata: labels: app: kube-batch name: kube-batch namespace: kube-system spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: kube-batch strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: labels: app: kube-batch spec: containers: - args: - --logtostderr - --v - "3" - --scheduler-conf=/etc/kube-batch/kube-batch-conf.yaml image: kube-batch/kube-batch:v0.5 imagePullPolicy: IfNotPresent name: kube-batch ports: - containerPort: 8080 name: http protocol: TCP resources: limits: cpu: "2" memory: 2Gi requests: cpu: "2" memory: 2Gi securityContext: capabilities: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/kube-batch/kube-batch-conf.yaml name: config subPath: kube-batch-conf.yaml dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 volumes: - configMap: defaultMode: 256 name: kube-batch-config optional: false name: config --- apiVersion: v1 kind: Service metadata: labels: app: kube-batch name: kube-batch namespace: kube-system spec: clusterIP: None ports: - name: http port: 8080 protocol: TCP targetPort: 8080 selector: app: kube-batch sessionAffinity: None type: ClusterIP
|
Kube-batch自带有一些prometheus的监控metrics,主要是调度时延,调度成功、调度失败数等
servicemonitor
如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: app: kube-batch name: kube-batch namespace: monitoring spec: endpoints: - honorLabels: true interval: 30s port: http jobLabel: app namespaceSelector: matchNames: - kube-system selector: matchLabels: app: kube-batch
|
最后在kube-system ns下看到kube-batch成功运行,安装OK.
流行的tf也原生支持kube-batch.
参考文章: