貝葉斯推斷 樸素貝葉斯分類 貝葉斯定理

2022-06-11 15:00:12 字數 4511 閱讀 7818

近期,由於專案需求,需要用到貝葉斯定理及其相關知識,於是又系統的學習了一下,順便做一下筆記。

**(非常詳細的注釋):

#

-*- coding:utf-8 -*-

import copy #

用於深度拷貝,適用於複雜的資料結構

#複雜的資料結構看不懂,一定要在紙上畫圖,畫出來就一目了然了

class

native_bayes:

def__init__

(self, character_vec_, class_vec_):

"""# 縮排必須正確,不然會報錯

建構函式,傳入的引數請看最底下的函式呼叫

character_vec_:[("character_a",["a1","a2","a3"]), ("character_b",["b1","b2","b3"])] 是乙個巢狀資料結構,最外層是乙個列表,內層是元組,元組裡還有列表

class_vec_:["class_x", "class_y"]

"""character_condition_per = {} #

建立乙個資料結構,建議在紙上畫出結構圖

#這是乙個巢狀的三層字典,用於統計計數

for character_name in

character_vec_:

character_condition_per[character_name[0]] ={}

for character_value in character_name[1]:

character_condition_per[character_name[0]][character_value] =

self.class_set = {} #

記錄該類別下各個特徵值的條件概率

#這是乙個兩層字典,內嵌乙個三層字典

for class_name in

class_vec_:

self.class_set[class_name] =

#print("init", character_vec_, self.class_set) #for debug

deflearn(self, sample_):

"""learn是訓練函式,傳入的引數為sample_:

[, #特徵向量

'class_name' : 'class_x' #類別名稱}]

"""for each_sample in

sample_:

character_vec_ = each_sample['

character']

class_name = each_sample['

class_name']

data_for_class =self.class_set[class_name]

data_for_class[

'num

'] += 1

#各個特質值樣本數量加1

for character_name in character_vec_: #

預設迭代的字典的鍵

character_value =character_vec_[character_name]

data_for_character = data_for_class['

character_condition_per

'][character_name][character_value]

data_for_character[

'num

'] += 1

#數量計算完畢, 計算最終的概率值

sample_num =len(sample_)

for each_sample in

sample_:

character_vec_ = each_sample['

character']

class_name = each_sample['

class_name']

data_for_class =self.class_set[class_name]

#計算類別的先驗概率

data_for_class['

class_per

'] = float(data_for_class['

num'])/sample_num

#各個特質值的條件概率

for character_name in

character_vec_:

character_value =character_vec_[character_name]

data_for_character = data_for_class['

character_condition_per

'][character_name][character_value]

data_for_character[

'condition_per

'] = float(data_for_character['

num'] / data_for_class['

num'

])

#from pprint import pprint

#pprint(self.class_set) #for debug

defclassify(self, input_):

"""分類函式:輸入引數input_:

"""best_class = ''

max_per = 0.0

for class_name in

self.class_set:

class_data =self.class_set[class_name]

per = class_data['

class_per']

#計算各個特徵值條件概率的乘積

for character_name in

input_:

character_per_data = class_data['

character_condition_per

'][character_name]

per = per * character_per_data[input_[character_name]]['

condition_per']

print

(class_name, per)

if per >=max_per:

best_class =class_name

return

best_class

#命名規則:函式引數後面加_,正常的則不加,非常容易區分 #臺頭

character_vec = [("

character_a

",["

a1","

a2","

a3"]),("

character_b

",["

b1","

b2","b3"

])]class_vec = ["

class_x

","class_y"]

bayes = native_bayes(character_vec, class_vec) #

建立物件

sample = [ #

建立訓練集

, #特徵向量

'class_name

' : '

class_x'#

類別名稱

}, , #

特徵向量

'class_name

' : '

class_x'#

類別名稱

}, , #

特徵向量

'class_name

' : '

class_x'#

類別名稱

}, , #

特徵向量

'class_name

' : '

class_x'#

類別名稱

}, , #

特徵向量

'class_name

' : '

class_y'#

類別名稱

}, , #

特徵向量

'class_name

' : '

class_y'#

類別名稱

}, , #

特徵向量

'class_name

' : '

class_y'#

類別名稱

}, , #

特徵向量

'class_name

' : '

class_y'#

類別名稱

},

]input_data =

bayes.learn(sample) #學習

print(bayes.classify(input_data)) #

測試

樸素貝葉斯分類

1 貝葉斯分類是一類分類演算法的總稱,這類演算法均以貝葉斯定理為基礎,故統稱為貝葉斯分類。2 樸素貝葉斯的思想基礎是這樣的 對於給出的待分類項,求解在此項出現的條件下各個類別出現的概率,哪個最大,就認為此待分類項屬於哪個類別。通俗來說,就好比這麼個道理,你在街上看到乙個黑人,我問你你猜這哥們 來的,...

樸素貝葉斯分類

摘自寫在公司內部的wiki 要解決的問題 表中增加欄位classification,有四個取值 0 初始值,未分類 1 positive 2 normal 99 negative review submit前,由樸素貝葉斯分類器決定該條review的flag屬於negative還是positive ...

分類 樸素貝葉斯

原始的貝葉斯公式為 p b a p a b p a p a b p b p a 1 在分類問題中,y為類別,x為樣本特徵,則已知待 的樣本特徵 x 它為類別yi 的概率為 p yi x p x yi p y i p x p yi jp xj y i p x 2 p yi 類別為y i的樣本 數總樣本...