梯度不更新
来源:2-8 神经网络实现(多分类逻辑斯蒂回归模型实现)
qq_雨后天晴_0
2019-10-17
老师,我尝试把数据集换成了fashionmnist,然后改了下代码,为什么W和b不更新啊,老师可以帮我看看么
import tensorflow as tf
import os
import _pickle as cPickle
import numpy as np
from keras.datasets import fashion_mnist
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()
train_data = train_data/255.0
test_data = test_data/255.0
tf.reset_default_graph()
X_all = tf.placeholder(tf.float32, shape=(None, 28, 28))
y_all = tf.placeholder(tf.int64, shape=(None,))
dataset = tf.data.Dataset.from_tensor_slices((X_all, y_all))
batch_size = 256
dataset = dataset.shuffle(2).batch(batch_size).repeat()
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
X = tf.placeholder(tf.float32, shape=(None, 28, 28))
y = tf.placeholder(tf.int64, shape=(None,))
W_1 = tf.Variable(np.random.normal(0, 0.01, (784, 256)), name='W_1', dtype=tf.float32)
b_1 = tf.Variable(np.zeros(shape=[256]), name='b_1', dtype=tf.float32)
W_2 = tf.Variable(np.random.normal(0, 0.01, (256, 10)), name='W_2', dtype=tf.float32)
b_2 = tf.Variable(np.zeros(shape=[10]), name='b_2', dtype=tf.float32)
H = tf.nn.relu(tf.matmul(tf.reshape(X, (-1, 784)), W_1) + b_1)
O = tf.matmul(H, W_2) + b_2
output = tf.nn.softmax(O)
predict = tf.argmax(output, 1)
accuracy = tf.reduce_mean(tf.cast(tf.equal(predict, y), tf.float64))
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=output))
optimizer = tf.train.AdamOptimizer(0.5).minimize(loss)
init = tf.global_variables_initializer()
num_epochs = 1
steps_per_epoch = train_data.shape[0] // batch_size + 1
train_loss, train_acc, n = 0.0, 0.0, 0
with tf.Session() as sess:
sess.run(init)
sess.run(iterator.initializer, feed_dict={X_all:train_data, y_all:train_labels})
for step in range(num_epochs * steps_per_epoch):
x_batch, y_batch = sess.run(next_element)
_, l, acc = sess.run([optimizer, loss, accuracy], feed_dict={X:x_batch, y:y_batch})
train_loss += l
train_acc += acc
n += x_batch.shape[0]
print(sess.run(W_1)[0][:4])
if ((step + 1) % 10 == 0) and not (step == 0):
test_acc, test_pre = sess.run([accuracy, predict], feed_dict={X:test_data, y:test_labels})
train_acc, train_pre = sess.run([accuracy, predict], feed_dict={X:train_data, y:train_labels})
test_pre_pre = test_pre
train_pre_pre = train_pre
print('epoch{:d}:loss {:.4f} train_acc {:.4f} test_acc {:.4f}'.format((step + 1)//steps_per_epoch, l, acc, test_acc))
这是输出
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
epoch0:loss 2.3830 train_acc 0.0781 test_acc 0.1000
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
[ 0.00772863 0.0250773 -0.00857907 -0.00014446]
epoch0:loss 2.3752 train_acc 0.0859 test_acc 0.1000
写回答
2回答
-
qq_雨后天晴_0
提问者
2019-10-17
好的,谢谢老师了
00 -
正十七
2019-10-17
同学你好,看你的loss还有accuracy确实在变化哇,我猜测是因为迭代次数不够多,你试试多迭代几次,看这个参数矩阵会不会变化?
另外,还有一种可能是W1在底层,传到底层的gradient可能不够大,你试试打印下W2的值,看变化大不大?
00