Kube-batch学习(安装部署)

上一篇中简单介绍了下kube-batch,这篇来简单说一下kube-batch的安装及配置,虽然官方github写的非常清楚,一来做个汇总,二来有些地方需要特别说明一下

因为kubernetes很早就已支持在一个集群中存在多个调度器,因此可以直接将新调度器安装在集群中

这里以kube-batch的release-0.5版本为准,这也是官方目前为止最新的release tag.

1
git clone -b release-0.5 https://github.com/kubernetes-sigs/kube-batch

官网的文档中使用的是helm的安装方法,不过helm由于verion2跟version3有一些差别,因此不一定对所有场景都可成功,作者最开始的时候就遇到了安装不成功的情况,因为helm的安装文件中,使用了"helm.sh/hook": "crd-install",这个annotations对helm的版本有要求,所以如果安装不成功,可以直接使用yaml文件进行安装,步骤如下:

1
2
3
4
5
6
# 先安装crds
cd config/crds
kubectl apply -f . -n kube-system
# 安装默认的quue
cd config/queue
kubectl apply -f . -n kube-system

这里要注意的,由于是调度器,会监听很多资源的创建情况,权限自然也需要大一点,官方也给了个权限文件,不过很简单粗暴,直接是集群admin的权限,如果怕有安全问题的话,这里就需要对权限进行收敛, 这个就因人而异,这里还是以官方的权限来说明

1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: default-sa-admin
subjects:
- kind: ServiceAccount
name: default
namespace: kube-system
roleRef:
kind: ClusterRole
name: cluster-admin
apiGroup: rbac.authorization.k8s.io

这里是直接将kube-system ns中的default sa直接给了cluster-admin的权限,这样,所有在kube-system ns下的pod,如果没有额外指定sa的话,则都是集群管理员权限

kubectl apply -f . -n kube-system

然后是配置文件,官方也给了默认的配置文件,对于插件的使用,这个要看实际的情况,这里还是以官方的默认配置来说明

1
2
3
4
5
6
7
8
9
10
actions: "allocate, backfill"
tiers:
- plugins:
- name: priority
- name: gang
- plugins:
- name: drf
- name: predicates
- name: proportion
- name: nodeorder

这个配置文件的信息量有点大,具体含义就不在这里展开,后篇再详说,

上面的配置是默认配置,如果没有指定配置文件的话,则默认使用上面的配置,当然kube-batch也支持指定配置文件

只需要在启动参数中指定配置文件的yaml即可,如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: v1
data:
kube-batch-conf.yaml: |-
actions: "allocate, backfill"
tiers:
- plugins:
- name: gang
- name: priority
- plugins:
- name: drf
- name: predicates
- name: proportion
- name: nodeorder
kind: ConfigMap
metadata:
name: kube-batch-config
namespace: kube-system

最后启动deployment即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: kube-batch
name: kube-batch
namespace: kube-system
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: kube-batch
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: kube-batch
spec:
containers:
- args:
- --logtostderr
- --v
- "3"
- --scheduler-conf=/etc/kube-batch/kube-batch-conf.yaml
image: kube-batch/kube-batch:v0.5
imagePullPolicy: IfNotPresent
name: kube-batch
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: "2"
memory: 2Gi
securityContext:
capabilities: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/kube-batch/kube-batch-conf.yaml
name: config
subPath: kube-batch-conf.yaml
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- configMap:
defaultMode: 256
name: kube-batch-config
optional: false
name: config
---
apiVersion: v1
kind: Service
metadata:
labels:
app: kube-batch
name: kube-batch
namespace: kube-system
spec:
clusterIP: None
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
selector:
app: kube-batch
sessionAffinity: None
type: ClusterIP

Kube-batch自带有一些prometheus的监控metrics,主要是调度时延,调度成功、调度失败数等

servicemonitor如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: kube-batch
name: kube-batch
namespace: monitoring
spec:
endpoints:
- honorLabels: true
interval: 30s
port: http
jobLabel: app
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: kube-batch

最后在kube-system ns下看到kube-batch成功运行,安装OK.

流行的tf也原生支持kube-batch.

参考文章: