prometheus alertmanager连接拒绝问题

来源:12-7 监控落地 - 指标完善、Grafana看板和邮件报警(上)

weixin_慕先生4121857

2019-12-10

kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-belle-prom-prometheus-oper-alertmanager-0 1/2 CrashLoopBackOff 26 82m
belle-prom-grafana-69fd957c8b-52t8p 2/2 Running 0 82m
belle-prom-kube-state-metrics-965fdbf7f-qx46m 1/1 Running 0 82m
belle-prom-prometheus-node-exporter-p2p2f 1/1 Running 0 82m
belle-prom-prometheus-node-exporter-qxjhx 1/1 Running 0 82m
belle-prom-prometheus-node-exporter-rhxk4 1/1 Running 0 82m
belle-prom-prometheus-oper-operator-78b4d5fd6f-8fzq6 1/1 Running 0 82m
prometheus-belle-prom-prometheus-oper-prometheus-0 3/3 Running 0 82m

然后通过kubectl describe pod alertmanager-belle-prom-prometheus-oper-alertmanager-0 -n monitoring查看报错信息:
Warning BackOff 4m58s (x271 over 82m) kubelet, sz19f-kubernetes-node-dev-10-0-43-75-vm.belle.lan Back-off restarting failed container
Warning Unhealthy 2s (x264 over 85m) kubelet, sz19f-kubernetes-node-dev-10-0-43-75-vm.belle.lan Readiness probe failed: Get http://172.22.1.143:9093/api/v1/status: dial tcp 172.22.1.143:9093: connect: connection refused
就绪探针检查时连接拒绝,这应该怎么排查?这是什么问题?现在就差这个问题了,其他问题都解决了。

写回答

5回答

qq_慕侠2000703

2021-06-02

解决了没

0
0

pkarqi001

2020-01-12

Name:               alertmanager-main-0

Namespace:          monitoring

Priority:           0

PriorityClassName:  <none>

Node:               10.69.36.90/10.69.36.90

Start Time:         Wed, 20 Nov 2019 00:33:42 +0800

Labels:             alertmanager=main

                    app=alertmanager

                    controller-revision-hash=alertmanager-main-6bd8d9f997

                    statefulset.kubernetes.io/pod-name=alertmanager-main-0

Annotations:        <none>

Status:             Running

IP:                 172.17.10.4

Controlled By:      StatefulSet/alertmanager-main

Containers:

  alertmanager:

    Container ID:  docker://787b392bb09b7f29eb131a22faa8de04b4f469142295704fbde26ca52fe5f94e

    Image:         quay.io/prometheus/alertmanager:v0.15.0

    Image ID:      docker-pullable://quay.io/prometheus/alertmanager@sha256:0ed4a8f776c5570b9e8152a670d3087a73164b20476a6a94768468759fbb5ad8

    Ports:         9093/TCP, 6783/TCP

    Host Ports:    0/TCP, 0/TCP

    Args:

      --config.file=/etc/alertmanager/config/alertmanager.yaml

      --cluster.listen-address=$(POD_IP):6783

      --storage.path=/alertmanager

      --data.retention=120h

      --web.listen-address=:9093

      --web.route-prefix=/

      --cluster.peer=alertmanager-main-0.alertmanager-operated.monitoring.svc:6783

      --cluster.peer=alertmanager-main-1.alertmanager-operated.monitoring.svc:6783

      --cluster.peer=alertmanager-main-2.alertmanager-operated.monitoring.svc:6783

    State:          Waiting

      Reason:       CrashLoopBackOff

    Last State:     Terminated

      Reason:       Error

      Exit Code:    2

      Started:      Sun, 12 Jan 2020 22:11:52 +0800

      Finished:     Sun, 12 Jan 2020 22:13:31 +0800

    Ready:          False

    Restart Count:  18544

    Requests:

      memory:   200Mi

    Liveness:   http-get http://:web/api/v1/status delay=0s timeout=3s period=10s #success=1 #failure=10

    Readiness:  http-get http://:web/api/v1/status delay=3s timeout=3s period=5s #success=1 #failure=10

    Environment:

      POD_IP:   (v1:status.podIP)

    Mounts:

      /alertmanager from alertmanager-main-db (rw)

      /etc/alertmanager/config from config-volume (rw)

      /var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-t2qhg (ro)

  config-reloader:

    Container ID:  docker://5fce41324cd097803ff3c7af67e761f7430b50a86b97b5c1d3e847619913e5c2

    Image:         quay.io/coreos/configmap-reload:v0.0.1

    Image ID:      docker-pullable://quay.io/coreos/configmap-reload@sha256:e2fd60ff0ae4500a75b80ebaa30e0e7deba9ad107833e8ca53f0047c42c5a057

    Port:          <none>

    Host Port:     <none>

    Args:

      -webhook-url=http://localhost:9093/-/reload

      -volume-dir=/etc/alertmanager/config

    State:          Running

      Started:      Wed, 20 Nov 2019 00:36:25 +0800

    Ready:          True

    Restart Count:  0

    Limits:

      cpu:     5m

      memory:  10Mi

    Requests:

      cpu:        5m

      memory:     10Mi

    Environment:  <none>

    Mounts:

      /etc/alertmanager/config from config-volume (ro)

      /var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-t2qhg (ro)

Conditions:

  Type              Status

  Initialized       True 

  Ready             False 

  ContainersReady   False 

  PodScheduled      True 

Volumes:

  config-volume:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  alertmanager-main

    Optional:    false

  alertmanager-main-db:

    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)

    Medium:  

  alertmanager-main-token-t2qhg:

    Type:        Secret (a volume populated by a Secret)

    SecretName:  alertmanager-main-token-t2qhg

    Optional:    false

QoS Class:       Burstable

Node-Selectors:  beta.kubernetes.io/os=linux

Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s

                 node.kubernetes.io/unreachable:NoExecute for 300s

Events:

  Type     Reason     Age                     From                  Message

  ----     ------     ----                    ----                  -------

  Warning  Unhealthy  11m (x348263 over 53d)  kubelet, 10.69.36.90  Readiness probe failed: Get http://172.17.10.4:9093/api/v1/status: dial tcp 172.17.10.4:9093: connect: connection refused

  Warning  BackOff    68s (x227094 over 53d)  kubelet, 10.69.36.90  Back-off restarting failed container


0
0

weixin_慕先生4121857

提问者

2019-12-11

已解决 忽略

0
7
慕圣5586765
回复
pkarqi001
./prometheus-operator/values.yaml的117行 to: "" 添加上自己的收件邮箱
2020-04-19
共7条回复

刘果国

2019-12-11

把alert-manager容器日志发出来

0
2
祁云逸
回复
weixin_慕先生4121857
怎么解决的 兄弟
2020-10-24
共2条回复

weixin_慕先生4121857

提问者

2019-12-10

搞了一天,安装这个prometheus


0
0

Kubernetes生产落地全程实践

一个互联网公司落地Kubernetes全过程点点滴滴

2293 学习 · 2216 问题

查看课程