樸素貝葉斯演算法的python實現

2022-02-28 14:58:54 字數 4631 閱讀 5290

樸素貝葉斯

比如我們想判斷乙個郵件是不是垃圾郵件,那麼我們知道的是這個郵件中的詞的分布,那麼我們還要知道:垃圾郵件中某些詞的出現是多少,就可以利用貝葉斯定理得到。

樸素貝葉斯分類器中的乙個假設是:每個特徵同等重要

loaddataset()

createvocablist(dataset)

setofwords2vec(vocablist, inputset)

bagofwords2vecmn(vocablist, inputset)

trainnb0(trainmatrix,traincatergory)

classifynb(vec2classify, p0vec, p1vec, pclass1)

1

#coding=utf-8

2from numpy import *

3def

loaddataset():

4 postinglist=[['

my', '

dog', '

has', '

flea

', '

problems

', '

help

', '

please'],

5 ['

maybe

', '

not', '

take

', '

him', '

to', '

dog', '

park

', '

stupid'],

6 ['

my', '

dalmation

', '

is', '

so', '

cute

', '

i', '

love

', '

him'

],7 ['

stop

', '

posting

', '

stupid

', '

worthless

', '

garbage'],

8 ['

mr', '

licks

', '

ate', '

my', '

steak

', '

how', '

to', '

stop

', '

him'

],9 ['

quit

', '

buying

', '

worthless

', '

dog', '

food

', '

stupid']]

10 classvec = [0,1,0,1,0,1] #

1 is abusive, 0 not

11return

postinglist,classvec

1213

#建立乙個帶有所有單詞的列表

14def

createvocablist(dataset):

15 vocabset =set()

16for document in

dataset:

17 vocabset = vocabset |set(document)

18return

list(vocabset)

1920

defsetofwords2vec(vocablist, inputset):

21 retvocablist = [0] *len(vocablist)

22for word in

inputset:

23if word in

vocablist:

24 retvocablist[vocablist.index(word)] = 1

25else:26

print

'word

',word ,'

not in dict'27

return

retvocablist

2829

#另一種模型

30def

bagofwords2vecmn(vocablist, inputset):

31 returnvec = [0]*len(vocablist)

32for word in

inputset:

33if word in

vocablist:

34 returnvec[vocablist.index(word)] += 1

35return

returnvec

3637

deftrainnb0(trainmatrix,traincatergory):

38 numtraindoc =len(trainmatrix)

39 numwords =len(trainmatrix[0])

40 pabusive = sum(traincatergory)/float(numtraindoc)41#

防止多個概率的成績當中的乙個為0

42 p0num =ones(numwords)

43 p1num =ones(numwords)

44 p0denom = 2.0

45 p1denom = 2.0

46for i in

range(numtraindoc):

47if traincatergory[i] == 1:

48 p1num +=trainmatrix[i]

49 p1denom +=sum(trainmatrix[i])

50else

:51 p0num +=trainmatrix[i]

52 p0denom +=sum(trainmatrix[i])

53 p1vect = log(p1num/p1denom)#

處於精度的考慮,否則很可能到限歸零

54 p0vect = log(p0num/p0denom)

55return

p0vect,p1vect,pabusive

5657

defclassifynb(vec2classify, p0vec, p1vec, pclass1):

58 p1 = sum(vec2classify * p1vec) + log(pclass1) #

element-wise mult

59 p0 = sum(vec2classify * p0vec) + log(1.0 -pclass1)

60if p1 >p0:

61return 1

62else

: 63

return064

65def

testingnb():

66 listoposts,listclasses =loaddataset()

67 myvocablist =createvocablist(listoposts)

68 trainmat=

69for postindoc in

listoposts:

7071 p0v,p1v,pab =trainnb0(array(trainmat),array(listclasses))

72 testentry = ['

love

', '

my', '

dalmation']

73 thisdoc =array(setofwords2vec(myvocablist, testentry))

74print testentry,'

classified as:

',classifynb(thisdoc,p0v,p1v,pab)

75 testentry = ['

stupid

', '

garbage']

76 thisdoc =array(setofwords2vec(myvocablist, testentry))

77print testentry,'

classified as:

',classifynb(thisdoc,p0v,p1v,pab)

7879

80def

main():

81testingnb()

8283

if__name__ == '

__main__':

84 main()

來自為知筆記(wiz)

樸素貝葉斯演算法python

樸素貝葉斯演算法的工作原理主要是概率論和數理統計 通過屬性對分類的影響程度,所展現不同的結果 import numpy as np x np.array 0,1,0,1 1,1,1,0 0,1,1,0 0,0,0,1 0,1,1,0 0,1,0,1 1,0,0,1 y np.array 0,1,1,...

樸素貝葉斯演算法

首先樸素貝葉斯分類演算法利用貝葉斯定理來 乙個未知類別的樣本屬於各個類別的可能性,選擇可能性最大的乙個類別作為該樣本的最終類別。對於計算而言,需要記住乙個公式。p c x p x c p c p x 類似於概率論條件概率公式。其中x代表含義為最終類別,例如 yes?no。c為屬性。使用樸素貝葉斯演算...

樸素貝葉斯演算法

計算貝葉斯定理中每乙個組成部分的概率,我們必須構造乙個頻率表 因此,如果電子郵件含有單詞viagra,那麼該電子郵件是垃圾郵件的概率為80 所以,任何含有單詞viagra的訊息都需要被過濾掉。當有額外更多的特徵時,此概念的使用 利用貝葉斯公式,我們可得到如下概率 分母可以先忽略它,垃圾郵件的總似然為...