目前平台有些遗留应用还是使用的influxdb保存监控数据, influxdb为单实例, 随时可能出现单机故障, 考虑到influxdb还将运行很长一段时间, 因此需要扩展成HA机制, 这里选择influxdb-relay方案
这里需要说明的是, relay不会同步两个influxdb实例之间的数据,它只提供双写的能力 ,即在某个实例出现问题后还能将数据正常写入另一个正常的实例, 保证数据不丢失.如果想在问题实例上同步这部分数据话需要人工介入.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ┌─────────────────┐ │writes & queries │ └─────────────────┘ │ ▼ ┌───────────────┐ │ │ ┌────────│ Load Balancer │─────────┐ │ │ │ │ │ └──────┬─┬──────┘ │ │ │ │ │ │ │ │ │ │ ┌──────┘ └────────┐ │ │ │ ┌─────────────┐ │ │┌──────┐ │ │ │/write or UDP│ │ ││/query│ │ ▼ └─────────────┘ ▼ │└──────┘ │ ┌──────────┐ ┌──────────┐ │ │ │ InfluxDB │ │ InfluxDB │ │ │ │ Relay │ │ Relay │ │ │ └──┬────┬──┘ └────┬──┬──┘ │ │ │ | | │ │ │ | ┌─┼──────────────┘ | │ │ │ │ └──────────────┐ │ │ │ ▼ ▼ ▼ ▼ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ │ │ │ └─▶│ InfluxDB │ │ InfluxDB │◀─┘ │ │ │ │ └──────────┘ └──────────┘
共涉及到3个应用
Influxdb: influxdb实例
Influxdb-relay: 代理influxdb的写流量, 通过双写机制保证数据写入到2个influxdb数据库中,读流量还是从lb直接转到influxdb实例,不会经过relay
loadBalancer: 对influxdb的读写流量都通过该服务进行代理到influxdb relay, 选择nginx即可,其它需要访问influxdb服务的配置参数都需要指定该应用的地址
新增实例 线上influxdb的版本为:
InfluxDB v1.7.4 (git: 1.7 ef77e72f435b71b1ad6da7d6a6a4c4a262439379)
部署机器: 192.168.1.5
需要在一台机器上部署一个influxdb新实例: 192.168.1.6
1 2 wget https://dl.influxdata.com/influxdb/releases/influxdb_1.7.4_amd64.deb dpkg -i influxdb_1.7.4_amd64.deb
两个实例使用的配置文件如下, 具体的参数配置可以根据情况定:
cat /etc/influxdb/influxdb.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 reporting-disabled = false bind-address = ":8088" [meta] dir = "/data/influxdb/meta" retention-autocreate = true [data] dir = "/data/influxdb/data" wal-dir = "/data/influxdb/wal" wal-fsync-delay = "50ms" index-version = "inmem" trace-logging-enabled = false query-log-enabled = false validate-keys = true cache-max-memory-size = "4g" compact-full-write-cold-duration = "6h" max-concurrent-compactions = 0 compact-throughput = "64m" compact-throughput-burst = "128m" tsm-use-madv-willneed = false max-series-per-database = 0 max-values-per-tag = 0 [coordinator] write-timeout = "600s" max-concurrent-queries = 0 query-timeout = "0s" log-queries-after = "60s" max-select-point = 0 max-select-series = 0 max-select-buckets = 0 [retention] enabled = true check-interval = "1h" [shard-precreation] enabled = true check-interval = "30m" advance-period = "30m" [monitor] store-enabled = true store-database = "_internal" store-interval = "60s" [http] enabled = true flux-enabled = true bind-address = ":8086" log-enabled = false write-tracing = false pprof-enabled = false debug-pprof-enabled = false https-enabled = false max-row-limit = 0 max-connection-limit = 0 unix-socket-enabled = false max-body-size = 0 max-concurrent-write-limit = 0 max-enqueued-write-limit = 0 enqueued-write-timeout = 0 [logging] format = "json" level = "info" suppress-logo = true [subscriber] enabled = false [[graphite]] enabled = true database = "graphite" retention-policy = "day_hour" bind-address = ":2003" protocol = "tcp" consistency-level = "one" batch-size = 1000 batch-pending = 50 batch-timeout = "1s" [[udp]] enabled = false [continuous_queries] enabled = true log-enabled = true query-stats-enabled = false run-interval = "10s" [tls]
实例启停方式:
1 2 systemctl start influxd.service systemctl stop influxd.service
启动新的influxdb实例后,需要将线上数据导入到该实例.
relay relay服务会对统一接入的流程进行转发, 可直接docker部署,目前只有一个实例, 可扩容成2个,配置相同.
配置文件做为configmap的形式挂载到容器中, 内容如下:
1 2 3 4 5 6 7 [[http]] name = "relay-http" bind-addr = ":9096" output = [ { name="db1" , location = "http://192.168.1.5:8086/write" }, { name="db2" , location = "http://192.168.1.6:8086/write" }, ]
同时,生成一个relay的svc,名为influxdb-relay-headless.sensego,端口号9096
nginx 从架构图中可以看出, 需要部署一个proxy层来代理influxdb的读写流量, 这里选择nginx
nginx做为容器部署, 一个实例即可
配置文件如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 worker_processes 8; events { worker_connections 10240; } http { client_max_body_size 0; upstream relay { server influxdb-relay-headless.sensego:9096; } upstream db { ip_hash; server 192.168.1.5:8086 max_fails=1 fail_timeout=10s; server 192.168.1.6:8086 max_fails=1 fail_timeout=10s; } server { listen 9096; location /ping { proxy_pass http://db; } location /write { limit_except POST { deny all; } proxy_pass http://relay; } location /query { proxy_pass http://db; } } }
数据迁移 由于influxdb只是保存metrics数据, 数据量大概在300G左右
不是特别敏感, 因此可以不停机进行备份,
这里采用的是在某个时间点进行全量备份,之后再通过增量备份来导入在操作期间写入的数据
虽然influxdb支持远程备份,建议在192.168.1.5本地进行备份,然后复制到新节点上
备份 1 2 3 4 5 6 7 influxd backup -host 192.168.1.5:8088 -portable /tmp/backup-all/influxdb influxd backup -portable -database mytsd -start 2017-04-28T06:49:00Z -end 2017-04-28T06:50:00Z /tmp/backup-ins/influxdb
恢复 1 2 3 4 influxd restore -portable /tmp/backup-all/influxdb influxd restore -portable /tmp/backup-ins/influxdb
到此,influxdb由单机点扩容到双节点, 避免了单机故障
参考文章: