關聯規則Apriori

2021-08-10 01:54:44 字數 4316 閱讀 2855

首先介紹的是啤酒和尿布的故事(上網自查),這是學習關聯規則必須知道的乙個故事。

頻繁項集,關聯規則,支援度,置信度這四個概念貫穿apriori演算法的始終。

如果乙個集合不是頻繁相集,那它的超集比然也不是頻繁相集。

機器學習實戰例子:

#coding:utf-8

from numpy import *

def loaddataset():

return[[1,3,4],[2,3,5],[1,2,3,5],[2,5]]

def createc1(dataset):

c1 =

for transaction in dataset:

for item in transaction:

if not [item] in c1:

c1.sort()

return map(frozenset,c1)

>>> d=apriori.loaddataset()

>>> c=apriori.createc1(d)

>>> print c

[frozenset([1]), frozenset([2]), frozenset([3]), frozenset([4]), frozenset([5])]

獲得集合中組成元素

def scand(d, ck, minsupport):

sscnt = {}

for tid in d:

for can in ck:

if can.issubset(tid):

if not sscnt.has_key(can): sscnt[can]=1

else: sscnt[can] += 1

numitems = float(len(d))

retlist =

supportdata = {}

for key in sscnt:

support = sscnt[key]/numitems

if support >= minsupport:

retlist.insert(0,key)

supportdata[key] = support

return retlist, supportdata

def apriorigen(lk, k): #creates ck 求兩個集合的合併

retlist =

lenlk = len(lk)

for i in range(lenlk):

for j in range(i+1, lenlk):

l1 = list(lk[i])[:k-2]; l2 = list(lk[j])[:k-2]

l1.sort(); l2.sort()

if l1==l2: #if first k-2 elements are equal

return retlist

def apriori(dataset, minsupport = 0.5):

c1 = createc1(dataset)

d = map(set, dataset)

l1, supportdata = scand(d, c1, minsupport)

l = [l1]

k = 2

while (len(l[k-2]) > 0):

ck = apriorigen(l[k-2], k) #每次迴圈合併成兩項集,三項集....

lk, supk = scand(d, ck, minsupport)#scan db to get lk

supportdata.update(supk)

k += 1

return l, supportdata

>>> l,suppdata=apriori.apriori(d)     

>>> print l

[[frozenset([1]), frozenset([3]), frozenset([2]), frozenset([5])], [frozenset([1, 3]), frozenset([2, 5]), frozenset([2, 3]), frozenset([3, 5])], [frozenset([2, 3, 5])], ]

>>> print suppdata

def generaterules(l, supportdata, minconf=0.7):  #supportdata is a dict coming from scand

bigrulelist =

for i in range(1, len(l)):#only get the sets with two or more items

for freqset in l[i]:

h1 = [frozenset([item]) for item in freqset]

if (i > 1):

rulesfromconseq(freqset, h1, supportdata, bigrulelist, minconf)

else:

calcconf(freqset, h1, supportdata, bigrulelist, minconf)

return bigrulelist

def calcconf(freqset, h, supportdata, brl, minconf=0.7):

prunedh = #create new list to return

for conseq in h:

conf = supportdata[freqset]/supportdata[freqset-conseq] #calc confidence

if conf >= minconf:

print freqset-conseq,'-->',conseq,'conf:',conf

return prunedh

def rulesfromconseq(freqset, h, supportdata, brl, minconf=0.7):

print "#h#",h

m = len(h[0])

if (len(freqset) > (m + 1)): #try further merging

hmp1 = apriorigen(h, m+1)#create hm+1 new candidates

hmp1 = calcconf(freqset, hmp1, supportdata, brl, minconf)

if (len(hmp1) > 1): #need at least two sets to merge

rulesfromconseq(freqset, hmp1, supportdata, brl, minconf)

>>> rules=apriori.generaterules(l,suppdata,minconf = 0.5)

frozenset([3]) --> frozenset([1]) conf: 0.666666666667

frozenset([1]) --> frozenset([3]) conf: 1.0

frozenset([5]) --> frozenset([2]) conf: 1.0

frozenset([2]) --> frozenset([5]) conf: 1.0

frozenset([3]) --> frozenset([2]) conf: 0.666666666667

frozenset([2]) --> frozenset([3]) conf: 0.666666666667

frozenset([5]) --> frozenset([3]) conf: 0.666666666667

frozenset([3]) --> frozenset([5]) conf: 0.666666666667

#h# [frozenset([2]), frozenset([3]), frozenset([5])]

frozenset([5]) --> frozenset([2, 3]) conf: 0.666666666667

frozenset([3]) --> frozenset([2, 5]) conf: 0.666666666667

frozenset([2]) --> frozenset([3, 5]) conf: 0.666666666667

#h# [frozenset([2, 3]), frozenset([2, 5]), frozenset([3, 5])]

Apriori 關聯規則演算法

關聯規則通過量化的數字描述物品甲的出現對物品乙的出現有多大的影響。它的模式屬於描述型模式,發現關聯規則的演算法屬於無監督學習的方法。其實是一種事物相關性的 通過對比支援度,進行剪枝,將支援度高的分支留下,繼續探尋關聯,直到再沒有高於最小支援度為止。應用場景比較廣泛,購物籃資料,醫療診斷,科學資料分析...

Apriori關聯規則演算法

例子 支援度 支援度是乙個百分比,指某個商品組合出現的次數與總次數之間的比例,支援度越高表示該組合出現的機率越大。在上面圖中我們可以發現 牛奶 出現了 4 次,那麼這 5 筆訂單中 牛奶 的支援度就是 4 5 0.8。同樣 牛奶 麵包 出現了 3 次,那麼這 5 筆訂單中 牛奶 麵包 的支援度就是 ...

關聯規則之apriori

import sys db 1,3,4 2,3,5 1,2,3,5 2,5 db1 l1 l2 l5 l2 l4 l2 l3 l1 l2 l4 l1 l3 l2 l3 l1 l3 l1 l2 l3 l5 l1 l2 l3 def convert db db return list map froze...