安装的Grafana对应的alertmanager没有完全启动
来源:8-2 Go PaaS 平台Prometheus 监控安装

qq_盒子_4
2023-01-04
pod的状态:
➜ ~ kubectl get pod -n monitoring -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 1/2 CrashLoopBackOff 57 3h48m 10.244.36.100 k8s-node1 <none> <none>
alertmanager-main-1 1/2 CrashLoopBackOff 57 3h48m 10.244.36.105 k8s-node1 <none> <none>
alertmanager-main-2 1/2 CrashLoopBackOff 57 3h48m 10.244.36.104 k8s-node1 <none> <none>
blackbox-exporter-6798fb5bb4-2x6sl 3/3 Running 0 3h51m 10.244.169.174 k8s-node2 <none> <none>
grafana-7476b4c65b-hw6qq 1/1 Running 0 3h51m 10.244.169.176 k8s-node2 <none> <none>
kube-state-metrics-649f8745b6-f2m5b 3/3 Running 0 3h51m 10.244.169.175 k8s-node2 <none> <none>
node-exporter-6pn5h 2/2 Running 0 3h51m 172.16.237.135 k8s-node1 <none> <none>
node-exporter-l4g8t 2/2 Running 0 3h51m 172.16.237.136 k8s-node2 <none> <none>
node-exporter-lht2r 2/2 Running 0 3h51m 172.16.237.134 k8s-master <none> <none>
prometheus-adapter-664d7c47b-2p29t 1/1 Running 0 3h51m 10.244.36.98 k8s-node1 <none> <none>
prometheus-adapter-664d7c47b-4fmbr 1/1 Running 0 3h51m 10.244.36.102 k8s-node1 <none> <none>
prometheus-k8s-0 2/2 Running 0 3h51m 10.244.169.177 k8s-node2 <none> <none>
prometheus-k8s-1 2/2 Running 0 3h51m 10.244.36.101 k8s-node1 <none> <none>
prometheus-operator-75d9b475d9-27l7f 2/2 Running 0 15h 10.244.169.188 k8s-node2 <none> <none>
➜ ~ kubectl describe pod alertmanager-main-0 -n monitoring
Name: alertmanager-main-0
Namespace: monitoring
Priority: 0
Node: k8s-node1/172.16.237.135
Start Time: Thu, 29 Dec 2022 15:22:12 +0800
Labels: alertmanager=main
app=alertmanager
app.kubernetes.io/component=alert-router
app.kubernetes.io/instance=main
app.kubernetes.io/managed-by=prometheus-operator
app.kubernetes.io/name=alertmanager
app.kubernetes.io/part-of=kube-prometheus
app.kubernetes.io/version=0.22.2
controller-revision-hash=alertmanager-main-7957cff7
statefulset.kubernetes.io/pod-name=alertmanager-main-0
Annotations: cni.projectcalico.org/containerID: 30604bfdaae834f90eb30e079d77e266eeb2932f88dc309f56dc5735f57d1b08
cni.projectcalico.org/podIP: 10.244.36.100/32
cni.projectcalico.org/podIPs: 10.244.36.100/32
kubectl.kubernetes.io/default-container: alertmanager
Status: Running
IP: 10.244.36.100
IPs:
IP: 10.244.36.100
Controlled By: StatefulSet/alertmanager-main
Containers:
alertmanager:
Container ID: docker://f1165ee2d907f7cd6a76bfa42ad05b0d6af9a5d2a314db626630bcac3ea1f5cc
Image: quay.io/prometheus/alertmanager:v0.22.2
Image ID: docker-pullable://quay.io/prometheus/alertmanager@sha256:624c1a5063c7c80635081a504c3e1b020d89809651978eb5d0b652a394f3022d
Ports: 9093/TCP, 9094/TCP, 9094/UDP
Host Ports: 0/TCP, 0/TCP, 0/UDP
Args:
--config.file=/etc/alertmanager/config/alertmanager.yaml
--storage.path=/alertmanager
--data.retention=120h
--cluster.listen-address=[$(POD_IP)]:9094
--web.listen-address=:9093
--web.route-prefix=/
--cluster.peer=alertmanager-main-0.alertmanager-operated:9094
--cluster.peer=alertmanager-main-1.alertmanager-operated:9094
--cluster.peer=alertmanager-main-2.alertmanager-operated:9094
--cluster.reconnect-timeout=5m
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: level=info ts=2022-12-29T11:06:12.318Z caller=main.go:221 msg="Starting Alertmanager" version="(version=0.22.2, branch=HEAD, revision=44f8adc06af5101ad64bd8b9c8b18273f2922051)"
level=info ts=2022-12-29T11:06:12.318Z caller=main.go:222 build_context="(go=go1.16.4, user=root@b595c7f32520, date=20210602-07:52:09)"
Exit Code: 2
Started: Thu, 29 Dec 2022 19:06:12 +0800
Finished: Thu, 29 Dec 2022 19:07:52 +0800
Ready: False
Restart Count: 57
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 4m
memory: 100Mi
Liveness: http-get http://:web/-/healthy delay=0s timeout=3s period=10s #success=1 #failure=10
Readiness: http-get http://:web/-/ready delay=3s timeout=3s period=5s #success=1 #failure=10
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/alertmanager from alertmanager-main-db (rw)
/etc/alertmanager/certs from tls-assets (ro)
/etc/alertmanager/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mmqs7 (ro)
config-reloader:
Container ID: docker://9566d643e31ee3db3e1e02b10c959721a224097887242f7a3e74a67e58e92fa5
Image: quay.io/prometheus-operator/prometheus-config-reloader:v0.49.0
Image ID: docker-pullable://quay.io/prometheus-operator/prometheus-config-reloader@sha256:61bd63e7bc1aaebd39983d2c118a453e59427ccaa1b188cbadd4d0bded415d17
Port: <none>
Host Port: <none>
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=:8080
--reload-url=http://localhost:9093/-/reload
--watched-dir=/etc/alertmanager/config
State: Running
Started: Thu, 29 Dec 2022 15:22:13 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: alertmanager-main-0 (v1:metadata.name)
SHARD: -1
Mounts:
/etc/alertmanager/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mmqs7 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main-generated
Optional: false
tls-assets:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main-tls-assets
Optional: false
alertmanager-main-db:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-mmqs7:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled <invalid> (x47 over 135m) kubelet Container image "quay.io/prometheus/alertmanager:v0.22.2" already present on machine
Warning BackOff <invalid> (x620 over 123m) kubelet Back-off restarting failed container
Warning Unhealthy <invalid> (x1151 over 135m) kubelet Readiness probe failed: Get "http://10.244.36.100:9093/-/ready": dial tcp 10.244.36.100:9093: connect: connection refused
➜ ~
statefulset详情:
➜ ~ kubectl get statefulset -n monitoring -o wide
NAME READY AGE CONTAINERS IMAGES
alertmanager-main 0/3 3h45m alertmanager,config-reloader quay.io/prometheus/alertmanager:v0.22.2,quay.io/prometheus-operator/prometheus-config-reloader:v0.49.0
prometheus-k8s 2/2 3h48m prometheus,config-reloader quay.io/prometheus/prometheus:v2.29.1,quay.io/prometheus-operator/prometheus-config-reloader:v0.49.0
➜ ~ kubectl describe statefulset alertmanager-main -n monitoring
Name: alertmanager-main
Namespace: monitoring
CreationTimestamp: Thu, 29 Dec 2022 13:48:46 +0800
Selector: alertmanager=main,app=alertmanager,app.kubernetes.io/instance=main,app.kubernetes.io/managed-by=prometheus-operator,app.kubernetes.io/name=alertmanager
Labels: alertmanager=main
app.kubernetes.io/component=alert-router
app.kubernetes.io/name=alertmanager
app.kubernetes.io/part-of=kube-prometheus
app.kubernetes.io/version=0.22.2
Annotations: prometheus-operator-input-hash: 6969648538193564551
Replicas: 3 desired | 3 total
Update Strategy: RollingUpdate
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: alertmanager=main
app=alertmanager
app.kubernetes.io/component=alert-router
app.kubernetes.io/instance=main
app.kubernetes.io/managed-by=prometheus-operator
app.kubernetes.io/name=alertmanager
app.kubernetes.io/part-of=kube-prometheus
app.kubernetes.io/version=0.22.2
Annotations: kubectl.kubernetes.io/default-container: alertmanager
Service Account: alertmanager-main
Containers:
alertmanager:
Image: quay.io/prometheus/alertmanager:v0.22.2
Ports: 9093/TCP, 9094/TCP, 9094/UDP
Host Ports: 0/TCP, 0/TCP, 0/UDP
Args:
--config.file=/etc/alertmanager/config/alertmanager.yaml
--storage.path=/alertmanager
--data.retention=120h
--cluster.listen-address=[$(POD_IP)]:9094
--web.listen-address=:9093
--web.route-prefix=/
--cluster.peer=alertmanager-main-0.alertmanager-operated:9094
--cluster.peer=alertmanager-main-1.alertmanager-operated:9094
--cluster.peer=alertmanager-main-2.alertmanager-operated:9094
--cluster.reconnect-timeout=5m
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 4m
memory: 100Mi
Liveness: http-get http://:web/-/healthy delay=0s timeout=3s period=10s #success=1 #failure=10
Readiness: http-get http://:web/-/ready delay=3s timeout=3s period=5s #success=1 #failure=10
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/alertmanager from alertmanager-main-db (rw)
/etc/alertmanager/certs from tls-assets (ro)
/etc/alertmanager/config from config-volume (rw)
config-reloader:
Image: quay.io/prometheus-operator/prometheus-config-reloader:v0.49.0
Port: <none>
Host Port: <none>
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=:8080
--reload-url=http://localhost:9093/-/reload
--watched-dir=/etc/alertmanager/config
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: (v1:metadata.name)
SHARD: -1
Mounts:
/etc/alertmanager/config from config-volume (ro)
Volumes:
config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main-generated
Optional: false
tls-assets:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main-tls-assets
Optional: false
alertmanager-main-db:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Volume Claims: <none>
Events: <none>
➜ ~
1回答
-
Cap
2023-01-09
要看详细的事件日志,一般情况下启动失败可能是资源不足。
00
相似问题