prometheus alertmanager连接拒绝问题
来源:12-7 监控落地 - 指标完善、Grafana看板和邮件报警(上)

weixin_慕先生4121857
2019-12-10
kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-belle-prom-prometheus-oper-alertmanager-0 1/2 CrashLoopBackOff 26 82m
belle-prom-grafana-69fd957c8b-52t8p 2/2 Running 0 82m
belle-prom-kube-state-metrics-965fdbf7f-qx46m 1/1 Running 0 82m
belle-prom-prometheus-node-exporter-p2p2f 1/1 Running 0 82m
belle-prom-prometheus-node-exporter-qxjhx 1/1 Running 0 82m
belle-prom-prometheus-node-exporter-rhxk4 1/1 Running 0 82m
belle-prom-prometheus-oper-operator-78b4d5fd6f-8fzq6 1/1 Running 0 82m
prometheus-belle-prom-prometheus-oper-prometheus-0 3/3 Running 0 82m
然后通过kubectl describe pod alertmanager-belle-prom-prometheus-oper-alertmanager-0 -n monitoring查看报错信息:
Warning BackOff 4m58s (x271 over 82m) kubelet, sz19f-kubernetes-node-dev-10-0-43-75-vm.belle.lan Back-off restarting failed container
Warning Unhealthy 2s (x264 over 85m) kubelet, sz19f-kubernetes-node-dev-10-0-43-75-vm.belle.lan Readiness probe failed: Get http://172.22.1.143:9093/api/v1/status: dial tcp 172.22.1.143:9093: connect: connection refused
就绪探针检查时连接拒绝,这应该怎么排查?这是什么问题?现在就差这个问题了,其他问题都解决了。
5回答
-
qq_慕侠2000703
2021-06-02
解决了没
00 -
pkarqi001
2020-01-12
Name: alertmanager-main-0
Namespace: monitoring
Priority: 0
PriorityClassName: <none>
Node: 10.69.36.90/10.69.36.90
Start Time: Wed, 20 Nov 2019 00:33:42 +0800
Labels: alertmanager=main
app=alertmanager
controller-revision-hash=alertmanager-main-6bd8d9f997
statefulset.kubernetes.io/pod-name=alertmanager-main-0
Annotations: <none>
Status: Running
IP: 172.17.10.4
Controlled By: StatefulSet/alertmanager-main
Containers:
alertmanager:
Container ID: docker://787b392bb09b7f29eb131a22faa8de04b4f469142295704fbde26ca52fe5f94e
Image: quay.io/prometheus/alertmanager:v0.15.0
Image ID: docker-pullable://quay.io/prometheus/alertmanager@sha256:0ed4a8f776c5570b9e8152a670d3087a73164b20476a6a94768468759fbb5ad8
Ports: 9093/TCP, 6783/TCP
Host Ports: 0/TCP, 0/TCP
Args:
--config.file=/etc/alertmanager/config/alertmanager.yaml
--cluster.listen-address=$(POD_IP):6783
--storage.path=/alertmanager
--data.retention=120h
--web.listen-address=:9093
--web.route-prefix=/
--cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:6783
--cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:6783
--cluster.peer=alertmanager-main-2.alertmanager-operated.monitoring.svc:6783
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Sun, 12 Jan 2020 22:11:52 +0800
Finished: Sun, 12 Jan 2020 22:13:31 +0800
Ready: False
Restart Count: 18544
Requests:
memory: 200Mi
Liveness: http-get http://:web/api/v1/status delay=0s timeout=3s period=10s #success=1 #failure=10
Readiness: http-get http://:web/api/v1/status delay=3s timeout=3s period=5s #success=1 #failure=10
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/alertmanager from alertmanager-main-db (rw)
/etc/alertmanager/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-t2qhg (ro)
config-reloader:
Container ID: docker://5fce41324cd097803ff3c7af67e761f7430b50a86b97b5c1d3e847619913e5c2
Image: quay.io/coreos/configmap-reload:v0.0.1
Image ID: docker-pullable://quay.io/coreos/configmap-reload@sha256:e2fd60ff0ae4500a75b80ebaa30e0e7deba9ad107833e8ca53f0047c42c5a057
Port: <none>
Host Port: <none>
Args:
-webhook-url=http://localhost:9093/-/reload
-volume-dir=/etc/alertmanager/config
State: Running
Started: Wed, 20 Nov 2019 00:36:25 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 5m
memory: 10Mi
Requests:
cpu: 5m
memory: 10Mi
Environment: <none>
Mounts:
/etc/alertmanager/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-t2qhg (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main
Optional: false
alertmanager-main-db:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
alertmanager-main-token-t2qhg:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main-token-t2qhg
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 11m (x348263 over 53d) kubelet, 10.69.36.90 Readiness probe failed: Get http://172.17.10.4:9093/api/v1/status: dial tcp 172.17.10.4:9093: connect: connection refused
Warning BackOff 68s (x227094 over 53d) kubelet, 10.69.36.90 Back-off restarting failed container
00 -
weixin_慕先生4121857
提问者
2019-12-11
已解决 忽略
072020-04-19 -
刘果国
2019-12-11
把alert-manager容器日志发出来
022020-10-24 -
weixin_慕先生4121857
提问者
2019-12-10
搞了一天,安装这个prometheus
00
相似问题