梯度不更新

来源:2-8 神经网络实现(多分类逻辑斯蒂回归模型实现)

qq_雨后天晴_0

2019-10-17

老师,我尝试把数据集换成了fashionmnist,然后改了下代码,为什么W和b不更新啊,老师可以帮我看看么

import tensorflow as tf
import os
import _pickle as cPickle
import numpy as np
from keras.datasets import fashion_mnist

(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()
train_data = train_data/255.0
test_data = test_data/255.0

tf.reset_default_graph()
X_all = tf.placeholder(tf.float32, shape=(None, 28, 28))
y_all = tf.placeholder(tf.int64, shape=(None,))

dataset = tf.data.Dataset.from_tensor_slices((X_all, y_all))
batch_size = 256
dataset = dataset.shuffle(2).batch(batch_size).repeat()
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()

X = tf.placeholder(tf.float32, shape=(None, 28, 28))
y = tf.placeholder(tf.int64, shape=(None,))

W_1 = tf.Variable(np.random.normal(0, 0.01, (784, 256)), name='W_1', dtype=tf.float32)
b_1 = tf.Variable(np.zeros(shape=[256]), name='b_1', dtype=tf.float32)
W_2 = tf.Variable(np.random.normal(0, 0.01, (256, 10)), name='W_2', dtype=tf.float32)
b_2 = tf.Variable(np.zeros(shape=[10]), name='b_2', dtype=tf.float32)
H = tf.nn.relu(tf.matmul(tf.reshape(X, (-1, 784)), W_1) + b_1)
O = tf.matmul(H, W_2) + b_2
output = tf.nn.softmax(O)

predict = tf.argmax(output, 1)
accuracy = tf.reduce_mean(tf.cast(tf.equal(predict, y), tf.float64))

loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=output))
optimizer = tf.train.AdamOptimizer(0.5).minimize(loss)
init = tf.global_variables_initializer()

num_epochs = 1
steps_per_epoch = train_data.shape[0] // batch_size + 1
train_loss, train_acc, n = 0.0, 0.0, 0
with tf.Session() as sess:
    sess.run(init)
    sess.run(iterator.initializer, feed_dict={X_all:train_data, y_all:train_labels})
    for step in range(num_epochs * steps_per_epoch):
        x_batch, y_batch = sess.run(next_element)
        _, l, acc = sess.run([optimizer, loss, accuracy], feed_dict={X:x_batch, y:y_batch})
        train_loss += l
        train_acc += acc
        n += x_batch.shape[0]
        print(sess.run(W_1)[0][:4])
        if ((step + 1) % 10 == 0) and not (step == 0):
            test_acc, test_pre = sess.run([accuracy, predict], feed_dict={X:test_data, y:test_labels})
            train_acc, train_pre = sess.run([accuracy, predict], feed_dict={X:train_data, y:train_labels})
            test_pre_pre = test_pre
            train_pre_pre = train_pre
            print('epoch{:d}:loss {:.4f} train_acc {:.4f} test_acc {:.4f}'.format((step + 1)//steps_per_epoch, l, acc, test_acc))

这是输出

[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
epoch0:loss 2.3830 train_acc 0.0781 test_acc 0.1000
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
[ 0.00772863  0.0250773  -0.00857907 -0.00014446]
epoch0:loss 2.3752 train_acc 0.0859 test_acc 0.1000
写回答

2回答

qq_雨后天晴_0

提问者

2019-10-17

好的,谢谢老师了

0
0

正十七

2019-10-17

同学你好,看你的loss还有accuracy确实在变化哇,我猜测是因为迭代次数不够多,你试试多迭代几次,看这个参数矩阵会不会变化?

另外,还有一种可能是W1在底层,传到底层的gradient可能不够大,你试试打印下W2的值,看变化大不大?

0
0

深度学习之神经网络(CNN/RNN/GAN)算法原理+实战

深度学习算法工程师必学,深入理解深度学习核心算法CNN RNN GAN

2617 学习 · 935 问题

查看课程