2019 01 10學習總結

關於nlp中的transformed模型，本專案的練習是基礎tensorflow2.0中的tutorials，詳情可簡直見官方api

tensorflow官方tutorial中transformer

1.資料集的準備

2.transformer的建立

encoder ==> 主要由基本的encoderlayer組成，encoderlayer由multi-head attention與feed forward組成，

multi-head attention的input_shape為（batch_size， seq_len， d_model）=> (batch_size, seq_len, num_heads, depth)中num-heads代表head的數量， num_heads*depth=d_model。由於在輸入的過程中，為了使得seq_len保持一致，做過padding，因此在multi-head attention中，需加上mask來消除padding值的影響。在搭配n個encoderlayer的基礎上，加上input embedding和positional encoding，組成了encoder。

decoder ==> 主要由基本的decoderlayer組成，decoderlayer由masked multi-head attention，multi-head

attention與feed forward，masked multi-head attention的輸入為tar_input, 然後來得到tar_prediction。其中mask主要是look_ahead mask, 主要是在**的過程中，mask掉後面的tokens。比如我要**第二個詞，就只用第乙個，**第三個，則用第乙個和第二個詞。

例如教程中寫到的：

the target is divided into tar_inp and tar_real. tar_inp is passed as an input to the decoder. tar_real is that same input shifted by 1: at each location in tar_input, tar_real contains the next token that should be predicted.

for example, sentence = 「sos a lion in the jungle is sleeping eos」

tar_inp = 「sos a lion in the jungle is sleeping」

tar_real = 「a lion in the jungle is sleeping eos」

然後第二個multi-head attention的輸入則為encoder的輸出和masked multi-head attention的輸出，mask為padding mask。

在decoderlayer的基礎上，加上embedding和positional encoding。

在encoderlayer和decoderlayer中，同樣加入residual成分，防止網路過深帶來的退化，再加上layernormalization。

在decoder的輸出上，加入線性層和softmax得到想要的輸出。

optimizer中自定義learning-rate，促進更好的收斂，趨勢如下圖

2019 01 10學習總結

20190110 php nginx編譯安裝

20190110 生成密碼以及簡易密碼強度檢查

學習後總結，總結後再學習

2019 01 10學習總結

20190110 php nginx編譯安裝

20190110 生成密碼以及簡易密碼強度檢查

學習後總結，總結後再學習

相關推薦