深度學習打亂資料的方法

在深度學習中，我們對資料集進行處理，放到神經網路之前，往往需要先打亂資料集，如果資料集是ndarray(numpy)資料，屬性(features)和標籤(labels)在同乙個array的話，也就是labels是在資料的最後乙個維度，前幾個維度均為資料的屬性，這樣我們可以通過numpy來打亂資料集。

**如下：

import numpy as np
dataset = np.load(filename)
np.random.shuffle(dataset)
features = dataset[:,:-1]
labels = dataset[:,-1]

注意這裡的np.random.shuffle() 是沒有返回值的。加入**是dataset = np.random.shuffle(dataset),**是會報錯的。

如果features 和labels是分開的ndarray,那麼可以使用一下**來打亂資料集。

import random 
random_index = random.sample(list(range(dataset.shape[0])),dataset.shape[0])
train_size = int(dataset.shape[0] * 0.6)
train_index = random_index[:train_size]
test_index = [i for i in train_index if i not in train_index]
train_features = features[train_index]
test_labels = labels[test_index]
test_features = features[test_index]
test_labels = labels[test_index]

注意這裡的random.sample 取樣的資料必須是list格式的。可以注意到在得到test_index的方法中，我使用了表示式的方法得到train_index的補集。這樣求補集的方法在資料量小的時候還能湊活用，但是資料量大的話迭代判斷會非常慢。用下面的方法可以快速求得train_index的補集。

test_index = list(set(random_index).difference(set(train_index)))

這種方法速度狂快無比，哈哈哈。。。。。。。。。。。。。。

深度學習打亂資料的方法

打亂資料集的方法

JS打亂陣列最高效的方法

JS打亂陣列最高效的方法

深度學習打亂資料的方法

打亂資料集的方法

JS打亂陣列最高效的方法

JS打亂陣列最高效的方法

相關推薦