关于decoder中的QKV问题
来源:10-23 DecoderLayer实现
qq_慕前端4252840
2021-08-11
老师,在decoder中,我看到您是把encoder outputs作为query,decoder的mha output作为key 和value,问一下这样做的理论依据是什么?query,key,value代表的意义是什么。为什么像这样就能够算出encoder 和decoder之间的注意力?
写回答
2回答
-
慕勒8140236
2021-09-16
个人理解Query翻译里面物理意义是
我稀罕你
Key是我爱你
Value是I love u00 -
正十七
2021-09-04
我们的decoderLayer的call函数实现如下:
def call(self, x, encoding_outputs, training, decoder_mask, encoder_decoder_padding_mask): # decoder_mask: 由look_ahead_mask和decoder_padding_mask合并而来 # x.shape: (batch_size, target_seq_len, d_model) # encoding_outputs.shape: (batch_size, input_seq_len, d_model) # attn1, out1.shape : (batch_size, target_seq_len, d_model) attn1, attn_weights1 = self.mha1(x, x, x, decoder_mask) attn1 = self.dropout1(attn1, training = training) out1 = self.layer_norm1(attn1 + x) # attn2, out2.shape : (batch_size, target_seq_len, d_model) attn2, attn_weights2 = self.mha2( out1, encoding_outputs, encoding_outputs, encoder_decoder_padding_mask) attn2 = self.dropout2(attn2, training = training) out2 = self.layer_norm2(attn2 + out1) # ffn_output, out3.shape: (batch_size, target_seq_len, d_model) ffn_output = self.ffn(out2) ffn_output = self.dropout3(ffn_output, training=training) out3 = self.layer_norm3(ffn_output + out2) return out3, attn_weights1, attn_weights2
self.mha2中,out1是query,encoding_outputs是key和value啊。
query和key去计算attention权重,value去和权重做乘积。物理意义就是,对于decoder来说,每一步都去和encoder的所有输出去做关联度(attention权重)计算,然后用encoder每一步的输出用关联度加权,得到decoder这一步需要encoder里的那些信息。然后再做下一步的计算。
00
相似问题