公司断电后etcd不能正常重启,大致错误如下,请问如何排查

来源:4-4 集群冒烟测试

慕前端304893

2022-09-13

etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2022-09-13 21:50:11 CST; 456ms ago
Docs: https://github.com/coreos
Process: 4171 ExecStart=/usr/local/bin/etcd --name node-3 --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/etc/etcd/kubernetes.pem --peer-key-file=/etc/etcd/kubernetes-key.pem --trusted-ca-file=/etc/etcd/ca.pem --peer-trusted-ca-file=/etc/etcd/ca.pem --peer-client-cert-auth --client-cert-auth --initial-advertise-peer-urls https://192.168.10.43:2380 --listen-peer-urls https://192.168.10.43:2380 --listen-client-urls https://192.168.10.43:2379,https://127.0.0.1:2379 --advertise-client-urls https://192.168.10.43:2379 --initial-cluster-token etcd-cluster-0 --initial-cluster node-1=https://192.168.10.41:2380,node-2=https://192.168.10.42:2380,node-3=https://192.168.10.43:2380 --initial-cluster-state new --data-dir=/var/lib/etcd (code=exited, status=2)
Main PID: 4171 (code=exited, status=2)

Sep 13 21:50:11 node-3 etcd[4171]: go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2()
Sep 13 21:50:11 node-3 etcd[4171]: /tmp/etcd-release-3.4.10/etcd/release/etcd/etcdmain/etcd.go:144 +0x2f71
Sep 13 21:50:11 node-3 etcd[4171]: go.etcd.io/etcd/etcdmain.Main()
Sep 13 21:50:11 node-3 etcd[4171]: /tmp/etcd-release-3.4.10/etcd/release/etcd/etcdmain/main.go:46 +0x38
Sep 13 21:50:11 node-3 etcd[4171]: main.main()
Sep 13 21:50:11 node-3 etcd[4171]: /tmp/etcd-release-3.4.10/etcd/release/etcd/main.go:28 +0x20
Sep 13 21:50:11 node-3 systemd[1]: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 13 21:50:11 node-3 systemd[1]: Failed to start etcd.
Sep 13 21:50:11 node-3 systemd[1]: Unit etcd.service entered failed state.
Sep 13 21:50:11 node-3 systemd[1]: etcd.service failed.
[root@node-3 ~]# journalctl -xe
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit etcd.service has begun starting up.
Sep 13 21:50:16 node-3 etcd[4197]: [WARNING] Deprecated ‘–logger=capnslog’ flag is set; use ‘–logger=zap’ flag instead
Sep 13 21:50:16 node-3 etcd[4197]: etcd Version: 3.4.10
Sep 13 21:50:16 node-3 etcd[4197]: [WARNING] Deprecated ‘–logger=capnslog’ flag is set; use ‘–logger=zap’ flag instead
Sep 13 21:50:16 node-3 etcd[4197]: Git SHA: 18dfb9cca
Sep 13 21:50:16 node-3 etcd[4197]: Go Version: go1.12.17
Sep 13 21:50:16 node-3 etcd[4197]: Go OS/Arch: linux/amd64
Sep 13 21:50:16 node-3 etcd[4197]: setting maximum number of CPUs to 4, total number of available CPUs is 4
Sep 13 21:50:16 node-3 etcd[4197]: the server is already initialized as member before, starting as etcd member…
Sep 13 21:50:16 node-3 etcd[4197]: peerTLS: cert = /etc/etcd/kubernetes.pem, key = /etc/etcd/kubernetes-key.pem, trusted-ca = /etc/etcd/ca.pem, client-cert-auth = true, crl-file =
Sep 13 21:50:16 node-3 etcd[4197]: name = node-3
Sep 13 21:50:16 node-3 etcd[4197]: data dir = /var/lib/etcd
Sep 13 21:50:16 node-3 etcd[4197]: member dir = /var/lib/etcd/member
Sep 13 21:50:16 node-3 etcd[4197]: heartbeat = 100ms
Sep 13 21:50:16 node-3 etcd[4197]: election = 1000ms
Sep 13 21:50:16 node-3 etcd[4197]: snapshot count = 100000
Sep 13 21:50:16 node-3 etcd[4197]: advertise client URLs = https://192.168.10.43:2379
Sep 13 21:50:16 node-3 etcd[4197]: initial advertise peer URLs = https://192.168.10.43:2380
Sep 13 21:50:16 node-3 etcd[4197]: initial cluster =
Sep 13 21:50:16 node-3 etcd[4197]: recovered store from snapshot at index 2700029
Sep 13 21:50:16 node-3 etcd[4197]: recovering backend from snapshot error: failed to find database snapshot file (snap: snapshot file doesn’t exist)
Sep 13 21:50:16 node-3 etcd[4197]: panic: recovering backend from snapshot error: failed to find database snapshot file (snap: snapshot file doesn’t exist)
Sep 13 21:50:16 node-3 etcd[4197]: panic: runtime error: invalid memory address or nil pointer dereference
Sep 13 21:50:16 node-3 etcd[4197]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xc0526e]
Sep 13 21:50:16 node-3 etcd[4197]: goroutine 1 [running]:
Sep 13 21:50:16 node-3 etcd[4197]: go.etcd.io/etcd/etcdserver.NewServer.func1(0xc0002e2e98, 0xc0002e0e58)
Sep 13 21:50:16 node-3 etcd[4197]: /tmp/etcd-release-3.4.10/etcd/release/etcd/etcdserver/server.go:334 +0x3e
Sep 13 21:50:16 node-3 etcd[4197]: panic(0xee5760, 0xc0001c9300)
Sep 13 21:50:16 node-3 etcd[4197]: /usr/local/go/src/runtime/panic.go:522 +0x1b5
Sep 13 21:50:16 node-3 etcd[4197]: github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc0001c1380, 0x10bffe6, 0x2a, 0xc0002e0f28, 0x1, 0x1)
Sep 13 21:50:16 node-3 etcd[4197]: /home/ANT.AMAZON.COM/leegyuho/go/pkg/mod/github.com/coreos/pkg@v0.0.0-20160727233714-3ac0863d7acf/capnslog/pkg_logger.go:75 +0x135
Sep 13 21:50:16 node-3 etcd[4197]: go.etcd.io/etcd/etcdserver.NewServer(0x7ffdd403bca4, 0x6, 0x0, 0x0, 0x0, 0x0, 0xc000243300, 0x1, 0x1, 0xc000242e80, …)
Sep 13 21:50:16 node-3 etcd[4197]: /tmp/etcd-release-3.4.10/etcd/release/etcd/etcdserver/server.go:464 +0x433c
Sep 13 21:50:16 node-3 etcd[4197]: go.etcd.io/etcd/embed.StartEtcd(0xc0002a0000, 0xc0002a0580, 0x0, 0x0)
Sep 13 21:50:16 node-3 etcd[4197]: /tmp/etcd-release-3.4.10/etcd/release/etcd/embed/etcd.go:213 +0x9c0
Sep 13 21:50:16 node-3 etcd[4197]: go.etcd.io/etcd/etcdmain.startEtcd(0xc0002a0000, 0x10950b4, 0x6, 0x1, 0xc00021f170)
Sep 13 21:50:16 node-3 etcd[4197]: /tmp/etcd-release-3.4.10/etcd/release/etcd/etcdmain/etcd.go:302 +0x40
Sep 13 21:50:16 node-3 etcd[4197]: go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2()
Sep 13 21:50:16 node-3 etcd[4197]: /tmp/etcd-release-3.4.10/etcd/release/etcd/etcdmain/etcd.go:144 +0x2f71
Sep 13 21:50:16 node-3 etcd[4197]: go.etcd.io/etcd/etcdmain.Main()
Sep 13 21:50:16 node-3 etcd[4197]: /tmp/etcd-release-3.4.10/etcd/release/etcd/etcdmain/main.go:46 +0x38
Sep 13 21:50:16 node-3 etcd[4197]: main.main()
Sep 13 21:50:16 node-3 etcd[4197]: /tmp/etcd-release-3.4.10/etcd/release/etcd/main.go:28 +0x20
Sep 13 21:50:16 node-3 systemd[1]: etcd.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 13 21:50:16 node-3 systemd[1]: Failed to start etcd.
– Subject: Unit etcd.service has failed
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit etcd.service has failed.

– The result is failed.
Sep 13 21:50:16 node-3 systemd[1]: Unit etcd.service entered failed state.
Sep 13 21:50:16 node-3 systemd[1]: etcd.service failed.
Sep 13 21:50:18 node-3 kubelet[1157]: I0913 21:50:18.489640 1157 kubelet.go:1962] SyncLoop (container unhealthy): "calico-node-wjmxb_kube-system(7294afba-6baa-4fc1-9516-7a2070c5f3cd)"
Sep 13 21:50:18 node-3 kubelet[1157]: I0913 21:50:18.489939 1157 scope.go:95] [topologymanager] RemoveContainer - Container ID: 86e03fcee1422758c6fe3b109151eb42f752a7a78f1f4d49b8847255a0240551
Sep 13 21:50:18 node-3 kubelet[1157]: E0913 21:50:18.490346 1157 pod_workers.go:191] Error syncing pod 7294afba-6baa-4fc1-9516-7a2070c5f3cd (“calico-node-wjmxb_kube-system(7294afba-6baa-4fc1-9516-7a2070c5f3cd)”), skipping: failed
Sep 13 21:50:18 node-3 kubelet[1157]: I0913 21:50:18.693286 1157 kuberuntime_manager.go:457] No ready sandbox for pod “coredns-84646c885d-2ff5b_kube-system(f8f1f866-d07c-4b24-b647-a12263c6c51f)” can be found. Need to start a new o
Sep 13 21:50:18 node-3 containerd[1007]: time=“2022-09-13T21:50:18.693736482+08:00” level=info msg="StopPodSandbox for "4c24dae2c44c242d8330253aacd264d35e1f23432632986eb526cf72400a1154""
Sep 13 21:50:18 node-3 containerd[1007]: time=“2022-09-13T21:50:18.693821378+08:00” level=info msg="Container to stop “d92c2aec55a5bf505b91184fbd0b4592026a57f1ab701e70c9b1638772a3cf8e” must be in running or unknown state, current s
Sep 13 21:50:18 node-3 containerd[1007]: time=“2022-09-13T21:50:18.836753352+08:00” level=error msg=“StopPodSandbox for “4c24dae2c44c242d8330253aacd264d35e1f23432632986eb526cf72400a1154” failed” error="failed to destroy network for
Sep 13 21:50:18 node-3 kubelet[1157]: E0913 21:50:18.837071 1157 remote_runtime.go:143] StopPodSandbox “4c24dae2c44c242d8330253aacd264d35e1f23432632986eb526cf72400a1154” from runtime service failed: rpc error: code = Unknown desc
Sep 13 21:50:18 node-3 kubelet[1157]: E0913 21:50:18.837105 1157 kuberuntime_manager.go:923] Failed to stop sandbox {“containerd” “4c24dae2c44c242d8330253aacd264d35e1f23432632986eb526cf72400a1154”}
Sep 13 21:50:18 node-3 kubelet[1157]: E0913 21:50:18.837144 1157 kuberuntime_manager.go:702] killPodWithSyncResult failed: failed to “KillPodSandbox” for “f8f1f866-d07c-4b24-b647-a12263c6c51f” with KillPodSandboxError: "rpc error:
Sep 13 21:50:18 node-3 kubelet[1157]: E0913 21:50:18.837162 1157 pod_workers.go:191] Error syncing pod f8f1f866-d07c-4b24-b647-a12263c6c51f (“coredns-84646c885d-2ff5b_kube-system(f8f1f866-d07c-4b24-b647-a12263c6c51f)”), skipping:

写回答

1回答

刘果国

2022-09-15

不要把日志放在一起看,etcd不依赖其他组件,只看etcd的日志即可,可以前台启动分析完整日志。

0
0

Kubernetes生产落地全程实践

一个互联网公司落地Kubernetes全过程点点滴滴

2293 学习 · 2216 问题

查看课程