老师，你这个模型训练运行了多久，我step=1都卡着动不了了

来源：6-23 TensorFlow-ssd 模型训练-实操（3）

linhbo

2019-08-06

2019-08-06 14:13:18.824773: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-08-06 14:13:18.845406: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1 2019-08-06 14:13:18.961147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-08-06 14:13:18.962183: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xfefb850 executing computations on platform CUDA. Devices: 2019-08-06 14:13:18.962220: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1070, Compute Capability 6.1 2019-08-06 14:13:18.997563: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2904000000 Hz 2019-08-06 14:13:18.998669: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xfefcba0 executing computations on platform Host. Devices: 2019-08-06 14:13:18.998729: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2019-08-06 14:13:18.999199: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-08-06 14:13:19.000223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.683 pciBusID: 0000:01:00.0 2019-08-06 14:13:19.000824: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: 无法打开共享对象文件: 没有那个文件或目录; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64: 2019-08-06 14:13:19.001079: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: 无法打开共享对象文件: 没有那个文件或目录; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64: 2019-08-06 14:13:19.001305: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: 无法打开共享对象文件: 没有那个文件或目录; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64: 2019-08-06 14:13:19.001520: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: 无法打开共享对象文件: 没有那个文件或目录; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64: 2019-08-06 14:13:19.002034: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: 无法打开共享对象文件: 没有那个文件或目录; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64: 2019-08-06 14:13:19.002252: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: 无法打开共享对象文件: 没有那个文件或目录; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64: 2019-08-06 14:13:19.099775: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7 2019-08-06 14:13:19.099863: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices... 2019-08-06 14:13:19.099908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-08-06 14:13:19.099933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 2019-08-06 14:13:19.099954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N 2019-08-06 14:13:20.291013: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile. I0806 14:13:21.129667 140537438725952 session_manager.py:500] Running local_init_op. I0806 14:13:21.288702 140537438725952 session_manager.py:502] Done running local_init_op. I0806 14:13:26.436102 140537438725952 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /home/lhb/models/datasets/widerface/resnet50v1-fpn/model.ckpt. I0806 14:13:46.865307 140537438725952 basic_session_run_hooks.py:262] loss = 9.194439, step = 1

这个是报错了吗，还是在训练，一直不动了

写回答

1回答