一直CrashLoopBackOff

来源:12-6 监控部署实战 - Helm+PrometheusOperator

祁云逸

2020-10-20

[root@m1 12-monitoring]# kubectl get pod -n monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-imooc-prom-prometheus-oper-alertmanager-0 1/2 CrashLoopBackOff 37 175m 172.22.4.7 s1

[root@m1 12-monitoring]# kubectl logs alertmanager-imooc-prom-prometheus-oper-alertmanager-0 -n monitoring -c alertmanager
level=info ts=2020-10-20T14:48:06.822238078Z caller=main.go:177 msg=“Starting Alertmanager” version="(version=0.16.2, branch=HEAD, revision=308b7620642dc147794e6686a3f94d1b6fc8ef4d)“
level=info ts=2020-10-20T14:48:06.822365431Z caller=main.go:178 build_context=”(go=go1.11.6, user=root@1e9a48272b38, date=20190405-12:27:40)"
level=warn ts=2020-10-20T14:48:06.845648386Z caller=cluster.go:226 component=cluster msg=“failed to join cluster” err="1 error occurred:\n\n* Failed to resolve alertmanager-imooc-prom-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:6783: lookup alertmanager-imooc-prom-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc on 10.254.0.2:53: no such host"
level=info ts=2020-10-20T14:48:06.845699144Z caller=cluster.go:228 component=cluster msg="will retry joining cluster every 10s"
level=warn ts=2020-10-20T14:48:06.845713226Z caller=main.go:268 msg=“unable to join gossip mesh” err="1 error occurred:\n\n* Failed to resolve alertmanager-imooc-prom-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc:6783: lookup alertmanager-imooc-prom-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc on 10.254.0.2:53: no such host"
level=info ts=2020-10-20T14:48:06.845985422Z caller=cluster.go:632 component=cluster msg=“Waiting for gossip to settle…” interval=2s
level=info ts=2020-10-20T14:48:06.879406118Z caller=main.go:334 msg=“Loading configuration file” file=/etc/alertmanager/config/alertmanager.yaml
level=error ts=2020-10-20T14:48:06.879775328Z caller=main.go:337 msg=“Loading configuration file failed” file=/etc/alertmanager/config/alertmanager.yaml err="missing to address in email config"
level=info ts=2020-10-20T14:48:06.880365867Z caller=cluster.go:641 component=cluster msg=“gossip not settled but continuing anyway” polls=0 elapsed=34.326927ms

老师怎么解决 搞不定呀 搜不到
刚开始健康检查通不过9093那个端口通不过,后来删除pod重新看了日志就是这样了。估计还跟网路有关,怎么解决呀?

写回答

1回答

刘果国

2020-10-21

首先看看这个服务是不是存在:alertmanager-imooc-prom-prometheus-oper-alertmanager-0.alertmanager-operated.monitoring.svc

存在的话可以定位大概率是dns问题,就去看dns组件的日志,看看是否正常。

不存在的话就比较麻烦,这个服务是operator自动创建的,可能是哪个步骤做的有问题建议重新部署一次PrometheusOperator

0
5
刘果国
回复
祁云逸
群是慕课官方维护的哈,他们应该很快会处理的
2020-10-23
共5条回复

Kubernetes生产落地全程实践

一个互联网公司落地Kubernetes全过程点点滴滴

2293 学习 · 2211 问题

查看课程