資料離散化和歸一化

在進行資料分析時，通常需要對資料進行歸一化和離散化的操作

from pylab import *
from numpy import *
import
codecs
import
matplotlib.pyplot as plt
import operator #
新加了乙個庫，用於排序
import
pandas as pd
from numpy.random import
random
from sklearn import
preprocessing 
url = "
resultdata.txt
"nmi_all= #
儲存所有的互資訊的值
data_number = 0 #
用於計數
featurenum=6 #
定義待讀取資料的特徵數量
data_num = 100 #
一百條資料
data =
defopen_file(url): 
with codecs.open(url, "r
") as f:
tmp =
for line in
f.readlines():
line1=line.strip()
line2=line1.split(','
) 
for i in
range(0, featurenum):
tmp =
datas =array(data)
defgui_yi_hua(data):
min_max_scaler =preprocessing.minmaxscaler() 
tseg_minmax =min_max_scaler.fit_transform(data)
return
(tseg_minmax)
#tseg_out = pd.dataframe(tseg_minmax)
#tseg_out.to_csv('tseg_out.csv')
defarry_discretization(tseg_minmax):
for tmp in
tseg_minmax:
print
(tmp)
ages=tmp
bins = [0,0.25,0.5,0.75,1]
group_names=['
這個屬於0-0.25
','這個屬於0.25-0.5
','這個屬於0.5-0.75
','這個屬於0.75-1']
cuts=pd.cut(ages,bins,labels=group_names)
print
(cuts)
print
(pd.value_counts(cuts))
if__name__ == '
__main__':
open_file(url)
arry_discretization(gui_yi_hua(data))

view code

資料歸一化和連續資料離散化處理

資料歸一化處理 1.0 1標準化 2.z score標準化 1.0 1標準化將資料的最大最小值記錄下來，並通過max min作為基數，進行資料的歸一化處理 2.z score標準化 z分數，是乙個分數與平均數的差再除以標準差的過程 z值的量代表著原始分數和母體平均值之間的距離，是以標準差為單位計算...

資料歸一化

近來，在網上搜了很多關於資料歸一化的帖子，看了太多，很雜，這裡整理總結一下歸一化是一種資料預處理方法，就是要把你需要處理的資料經過處理後通過某種演算法限制在你需要的一定範圍內，為了後面資料處理的方便，其次是保正程式執行時收斂加快。比如說，對於奇異樣本資料所謂奇異樣本資料資料指的是相對於其他...

資料歸一化

如果對神經網路的輸入和輸出數據進行一定的預處理可以加快網路的訓練速度 matlab 中提供的預處理方法有歸一化處理將每組資料都變為 1 至 1 之間數所涉及的函式有 premnmx postmnmx tramnmx 標準化處理將每組資料都為均值為 0,方差為 1 ...

資料離散化和歸一化

資料歸一化和連續資料離散化處理

資料歸一化

資料歸一化

相關推薦