K8s 部署 Redis 哨兵集群:主从自动切换 + 高可用实战全攻略

张开发
2026/4/4 20:15:34 15 分钟阅读

分享文章

K8s 部署 Redis 哨兵集群:主从自动切换 + 高可用实战全攻略
一、Redis 哨兵集群是什么Redis 主从复制保证了数据的冗余备份但如果主节点挂了需要人工介入手动切换——在生产环境这是不可接受的。Redis Sentinel哨兵就是来解决这个问题的监控Monitoring持续监控主从节点是否存活自动故障转移Automatic Failover主节点宕机后自动选举新主并重新配置从节点通知Notification故障转移后通知客户端新的主节点地址Redis Sentinel vs Redis Cluster对比项Sentinel哨兵Cluster集群数据分片无所有数据在同一实例有16384 个槽自动分片高可用✅ 自动故障转移✅ 故障自动分片迁移写能力只有主节点可写每个分片主节点可写适用场景单节点写压力、数据量适中数据量大、需要水平扩展最小节点1 主 2 从 3 哨兵 6 个至少 6 个节点3 主 3 从本文聚焦哨兵模式适合不需要数据分片、但必须保证高可用的业务场景。二、整体架构设计生产环境推荐的最小高可用架构 Redis Master主节点—— 接收读写请求 Redis Slave x2从节点—— 异步复制 读请求分发 Sentinel x3哨兵节点—— 监控 投票 故障转移哨兵节点应分布在不同物理机/可用区避免单点K8s 部署拓扑1 个Master StatefulSet1 副本固定域名 redis-master2 个Slave StatefulSet2 副本通过 Service 分发读请求3 个Sentinel Deployment3 副本共享同一个 Service3 个Headless Service稳定 Pod DNS1 个Master Service固定指向当前主节点1 个Slave Service负载均衡所有从节点1 个Sentinel Service客户端连接哨兵用三、配置文件准备3.1 Redis Master 配置 redis-master.conf bind 0.0.0.0 port 6379 protected-mode no daemonize no pidfile /var/run/redis.pid loglevel notice logfile # 持久化AOF RDB 混合 appendonly yes appendfilename appendonly.aof appendfsync everysec # RDB 快照 save 900 1 save 300 10 save 60 10000 rdbcompression yes rdbchecksum yes # 连接数 maxclients 10000 timeout 300 # 内存根据节点规格调整 maxmemory 2gb maxmemory-policy allkeys-lru3.2 Redis Slave 配置 redis-slave.conf bind 0.0.0.0 port 6379 protected-mode no daemonize no pidfile /var/run/redis.pid loglevel notice logfile # 从节点配置 replica-read-only yes repl-diskless-sync yes # 无盘复制减少 IO repl-diskless-sync-delay 5 # 指向主节点会被动态覆盖但需要默认值 replicaof redis-master 6379 # 持久化 appendonly yes appendfilename appendonly.aof maxmemory 2gb maxmemory-policy allkeys-lru3.3 Sentinel 配置核心 sentinel.conf port 26379 protected-mode no daemonize no pidfile /var/run/redis-sentinel.pid loglevel notice logfile # 监控的主节点配置 # sentinel monitor # quorum 哨兵投票数超过半数即认定主节点宕机 sentinel monitor mymaster redis-master 6379 2 # 主节点无响应多久后认定宕机毫秒 sentinel down-after-milliseconds mymaster 5000 # 故障转移超时时间 sentinel failover-timeout mymaster 180000 # 故障转移后并行同步的从节点数量越多越快但压力大 sentinel parallel-syncs mymaster 1 # 通知脚本可选 sentinel notification-script mymaster /opt/notify.sh ⚠️ quorum 解读假设有 3 个哨兵节点quorum2 表示至少 2 个哨兵认为主节点挂了才触发故障转移。推荐哨兵数量 Nquorum 设置为 (N/21)。四、K8s 资源编排4.1 ConfigMap统一管理配置 redis-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: name: redis-config namespace: redis data: redis-master.conf: | bind 0.0.0.0 port 6379 protected-mode no daemonize no appendonly yes maxmemory 2gb maxmemory-policy allkeys-lru redis-slave.conf: | bind 0.0.0.0 port 6379 protected-mode no daemonize no replica-read-only yes repl-diskless-sync yes replicaof redis-master 6379 appendonly yes maxmemory 2gb maxmemory-policy allkeys-lru sentinel.conf: | port 26379 protected-mode no daemonize no sentinel monitor mymaster redis-master 6379 2 sentinel down-after-milliseconds mymaster 5000 sentinel failover-timeout mymaster 180000 sentinel parallel-syncs mymaster 14.2 Master StatefulSet redis-master-sts.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: redis-master namespace: redis spec: serviceName: redis-master replicas: 1 selector: matchLabels: app: redis role: master template: metadata: labels: app: redis role: master spec: initContainers: - name: init image: redis:7-alpine command: [sh, -c, cp /config/redis-master.conf /redis.conf] volumeMounts: - name: conf mountPath: /config - name: redis-config mountPath: /redis.conf subPath: redis-master.conf containers: - name: redis image: redis:7-alpine command: [redis-server, /redis.conf] ports: - containerPort: 6379 volumeMounts: - name: data mountPath: /data - name: conf mountPath: /config volumes: - name: conf emptyDir: {} - name: redis-config configMap: name: redis-config items: - key: redis-master.conf path: redis-master.conf4.3 Slave StatefulSet带 initContainers 自动配置 replicaof redis-slave-sts.yaml apiVersion: apps/v1 kind: StatefulSet metadata: name: redis-slave namespace: redis spec: serviceName: redis-slave replicas: 2 selector: matchLabels: app: redis role: slave template: metadata: labels: app: redis role: slave spec: initContainers: - name: init image: redis:7-alpine command: [sh, -c, cp /config/redis-slave.conf /redis.conf echo replicaof redis-master 6379 /redis.conf] volumeMounts: - name: conf - name: redis-config mountPath: /config mountPath: /redis.conf subPath: redis-slave.conf containers: - name: redis image: redis:7-alpine command: [redis-server, /redis.conf] ports: - containerPort: 6379 volumeMounts: - name: data mountPath: /data - name: conf mountPath: /config volumes: - name: conf emptyDir: {} - name: redis-config configMap: name: redis-config4.4 Sentinel Deployment redis-sentinel-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: redis-sentinel namespace: redis spec: replicas: 3 selector: matchLabels: app: redis-sentinel template: metadata: labels: app: redis-sentinel spec: initContainers: - name: init image: redis:7-alpine command: [sh, -c, cp /config/sentinel.conf /sentinel.conf] volumeMounts: - name: conf mountPath: /config - name: sentinel-conf mountPath: /sentinel.conf subPath: sentinel.conf containers: - name: sentinel image: redis:7-alpine command: [redis-sentinel, /sentinel.conf] ports: - containerPort: 26379 volumeMounts: - name: conf mountPath: /config volumes: - name: conf emptyDir: {} - name: sentinel-conf configMap: name: redis-config4.5 Service 配置 redis-services.yaml --- # Master Headless Service哨兵需要通过 DNS 发现 master apiVersion: v1 kind: Service metadata: name: redis-master namespace: redis spec: type: ClusterIP clusterIP: None # Headless selector: app: redis role: master ports: - port: 6379 targetPort: 6379 --- # Slave Service读请求分发 apiVersion: v1 kind: Service metadata: name: redis-slave namespace: redis spec: type: ClusterIP selector: app: redis role: slave ports: - port: 6379 targetPort: 6379 --- # Sentinel Service客户端连接哨兵用 apiVersion: v1 kind: Service metadata: name: redis-sentinel namespace: redis spec: type: ClusterIP selector: app: redis-sentinel ports: - port: 26379 targetPort: 26379五、一键部署 部署命令# 1. 创建命名空间kubectl create ns redis# 2. 部署 ConfigMapkubectl apply -f redis-configmap.yaml -n redis# 3. 部署 Masterkubectl apply -f redis-master-sts.yaml -n redis# 4. 部署 Slavekubectl apply -f redis-slave-sts.yaml -n redis# 5. 部署 Sentinelkubectl apply -f redis-sentinel-deploy.yaml -n redis# 6. 部署 Servicekubectl apply -f redis-services.yaml -n redis# 7. 验证部署kubectl get pods -n redis -o widekubectl get svc -n redis六、验证集群状态6.1 验证主从复制# 连接到 Master写入数据kubectl exec -it redis-master-0 -n redis -- redis-cli -p 6379 SET test-key Hello Sentinel GET test-key# 连接到 Slave验证复制成功kubectl exec -it redis-slave-0 -n redis -- redis-cli -p 6379 GET test-key# 预期输出Hello Sentinel6.2 验证哨兵集群状态# 查看哨兵感知到的主节点信息kubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster# 查看所有哨兵节点kubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 SENTINEL masters# 查看从节点列表kubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 SENTINEL slaves mymaster6.3 验证集群整体健康# 查看所有 Pod 状态kubectl get pods -n redis -l app in (redis,redis-sentinel)# 查看 Redis infokubectl exec -it redis-master-0 -n redis -- redis-cli -p 6379 INFO replication# 查看 Sentinel 感知的主节点 IPkubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster七、故障转移实战演示模拟 Master 宕机 Step 1: 记录当前主节点# 记录当前主节点 IPkubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster Step 2: 注入故障杀死 Master Podkubectl delete pod redis-master-0 -n rediskubectl get pods -n redis -w Step 3: 观察故障转移# 等待 5-10 秒后查看主节点是否已切换kubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 SENTINEL get-master-addr-by-name mymaster# 查看哪个 Pod 被选为新主rolemasterfor p in $(kubectl get pods -n redis -l appredis -o jsonpath{.items[*].metadata.name}); doecho $p kubectl exec -it $p -n redis -- redis-cli -p 6379 ROLE 2/dev/nulldone Step 4: 验证数据完整性# 旧主节点重新上线后变为从节点kubectl exec -it redis-master-0 -n redis -- redis-cli -p 6379 ROLE# 预期输出master 或 slave# 验证数据是否丢失好的故障转移应该零数据丢失kubectl exec -it redis-master-0 -n redis -- redis-cli -p 6379 GET test-key八、客户端正确使用姿势8.1 Python 客户端正确连接哨兵 redis_client.pyfrom redis.sentinel import Sentinel# 连接哨兵集群填写 K8s Service 地址sentinel Sentinel([(redis-sentinel.redis.svc.cluster.local, 26379)],socket_timeout0.1)# 获取当前主节点连接用于写操作master sentinel.master_for(mymaster, socket_timeout1)# 获取从节点连接用于读操作slave sentinel.slave_for(mymaster, socket_timeout1)# 写操作走主节点master.set(key1, value1)# 读操作走从节点负载均衡result slave.get(key1)# 自动获取最新主节点故障转移后无需改配置# Sentinel 会感知主节点变化master_for 自动返回新主8.2 Java 客户端Jedis RedisSentinelClient.javaimport redis.clients.jedis.JedisPoolConfig;import redis.clients.jedis.JedisSentinelPool;JedisPoolConfig poolConfig new JedisPoolConfig();poolConfig.setMaxTotal(100);poolConfig.setMaxIdle(20);Set sentinels new HashSet();sentinels.add(redis-sentinel.redis.svc.cluster.local:26379);JedisSentinelPool pool new JedisSentinelPool(mymaster, # 与哨兵配置中的 master-name 一致sentinels,poolConfig,3000 # 连接超时);// 获取连接自动感知主节点变化Jedis jedis pool.getResource();jedis.set(hello, world);String value jedis.get(hello);jedis.close(); // 归还连接池不是关闭✅ 正确做法客户端始终连接哨兵集群由哨兵返回当前主节点地址。故障转移后哨兵会自动更新客户端无需重启或改配置。九、生产环境高级配置9.1 数据持久化推荐 AOF RDB 混合# 推荐持久化配置AOF everysec RDB 快照appendonly yesappendfsync everysec # 性能与安全的平衡save 900 1 # 15分钟至少1个key变化save 300 10 # 5分钟至少10个key变化save 60 10000 # 1分钟至少10000个key变化9.2 资源限制生产必须设置 redis-master-sts-resources.yaml添加资源限制# 在 containers 下添加 resources 字段resources:requests:cpu: 500mmemory: 2Gilimits:cpu: 2000mmemory: 4Gi9.3 探针配置保障服务可用性# 在 containers 下添加 livenessProbe 和 readinessProbelivenessProbe:tcpSocket:port: 6379initialDelaySeconds: 15periodSeconds: 10readinessProbe:exec:command: [redis-cli, ping]initialDelaySeconds: 5periodSeconds: 59.4 故障转移通知脚本 notify.sh挂载到哨兵容器#!/bin/shwhile true; dosleep 60# 发送告警到钉钉/企微/飞书curl -H Content-Type: application/json \-d {msgtype:text,text:{content:Redis Sentinel 故障转移通知}} \https://oapi.dingtalk.com/robot/send?access_tokenYOUR_TOKENdone十、排障命令速查# 查看所有 Redis 进程kubectl get pods -n redis -l appredis -o wide# 查看主从角色kubectl exec -it redis-master-0 -n redis -- redis-cli -p 6379 ROLE# 查看复制延迟kubectl exec -it redis-master-0 -n redis -- redis-cli -p 6379 INFO replication | grep -E role|master_link_status|master_repl_offset# 查看哨兵集群状态kubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 INFO sentinel# 查看哨兵选举详情kubectl exec -it $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis -- redis-cli -p 26379 SENTINEL ckquorum mymaster# 查看 Redis 日志kubectl logs redis-master-0 -n redis --tail100kubectl logs $(kubectl get pod -l appredis-sentinel -n redis -o jsonpath{.items[0].metadata.name}) -n redis --tail50

更多文章