神經網路訓練實踐粗淺心得

實踐過程主要參考了此篇部落格：the number of hidden layersl。

隱藏層先選擇1層，即使用3層神經網路進行訓練。

然後增加1層看看訓練集、測試集等的效果，如果幾乎沒有提公升，那就只在3層基礎上優化吧。

畢竟增加一層，時間效率、成本都需要考慮、

鏈結中給出了3條指導性的原則，那不妨就先採用2/3*輸入輸出層神經元之和作為隱藏層神經元個數。

number of hidden layer neurons: 9 test_precision: 0.9969 number of hidden layer neurons: 21 test_precision: 0.9974 number of hidden layer neurons: 15 test_precision: 0.9971 number of hidden layer neurons: 22 test_precision: 0.9972 number of hidden layer neurons: 8 test_precision: 0.9968 number of hidden layer neurons: 21 test_precision: 0.9974 number of hidden layer neurons: 14 test_precision: 0.9971 number of hidden layer neurons: 10 test_precision: 0.9973 number of hidden layer neurons: 17 test_precision: 0.9971 number of hidden layer neurons: 19 test_precision: 0.9971

我比較習慣用rmsprop、adam和sgd，其實影響最大的還是學習率。

最近研究了下torch.optim.lr_scheduler，其包含了眾多的學習率lr調整策略，如lambdalr，multiplicativelr等，但是我在實際訓練過程中發現，加上這些學習率策略，有時甚至會適得其反。

我這裡採用了一種策略：early_stopping+steplr。

即通過檢查某個metric，如val_loss，當訓練過程中它出現一定次數不再增加（滿足early_stopping條件）時，讓學習率*factor；

設定滿足early_stopping的次數，達到時自動退出訓練。

這樣我認為是比較高效的。

對於訓練資料量比較大，當前使用的計算資源又不太好（如只有cpu），mini-batch sgd確實執行起來的，而且由於其選擇資料的隨機性，最終的效果也並不差。

在訓練中單獨將訓練結果不正確的樣本抽出來，再以加入一定倍數個數的訓練正確的樣本，將這個新的資料集重新開始訓練。

遺憾的是，發現對最終結果並未有太大提公升。

看來初始訓練結果錯誤的樣本，很有可能是雜訊點。

當前訓練資料集是個四分類任務，簡單對比了下torch.nn.smoothl1loss、smoothl1loss以及bcewithlogitsloss，最後發現bcewithlogitsloss還是相對優秀的。

發現模型中使用了bn之後，資料預處理有沒有加standardscaler，沒有啥影響~

from sklearn.preprocessing import minmaxscaler,standardscaler

[1]

[2] a recipe for training neural networks/

神經網路訓練實踐粗淺心得

神經網路訓練

神經網路實踐

神經網路訓練技巧

神經網路訓練實踐粗淺心得

神經網路訓練

神經網路實踐

神經網路訓練技巧

相關推薦