親和性分析

小編最近在看 robert layton 的資料探勘，寫隨筆一方面為了加深印象，一方面為了以後方便看。

通常**為了增大需求，常常把顧客願意一起買的東西放在一起。這樣顧客買的機率較大，能夠刺激消費。

最簡單的例子就是，你買了羊肉卷，那你肯定也想買墨魚丸，買了墨魚丸，想到了火鍋底料，如果你沒有想到的化，那麼商家可能估計把他放在羊肉卷附近，讓你很容易的看到他。

首先來介紹一下什麼是親和性分析。指的是某幾種事物之間有著某種的聯絡。比如**上會根據你的個人愛好或者經常瀏覽的東西給予推薦，某種程度上會次基消費。

假設商場考慮擺放麵包牛奶乳酪蘋果香蕉的擺放位置。肯定要遵循一些規則，比如顧客買完麵包後，很大可能會買一些牛奶，所以就讓麵包和牛奶放在一起。任何乙個規則都由前提和結論組成，上例中，前提就是顧客買賣麵包，結論就是顧客很大程度上買牛奶。

評價規則的好壞，就是他發生的可能性的大小，越大，說明人群中符合這個規則的人越多。通常判斷規則的好壞的指標有支援度和置信度。支援度代表規則有效的數目，置信度指的是有效的規則所佔的比例。

該商場考慮5中商品。為了方便，我們將用矩陣的列表示5中商品，行代表每位個體。1代表購買，反之代表不買，同時我們不考慮（逛了一圈啥都不買的人）。

首先，資料可以通過調查問卷的形式（足夠大，足夠客觀），為了省事，可以用隨機數生成的方法。

在設定閾值大小的時候，我們可以適當的應用一些因果關係，讓生成的資料集更加準確。比如：買麵包的人通常都會買一些牛奶，所以在購買麵包情況下購買牛奶的閾值要比不買麵包大一些；同理，買蘋果之後再買香蕉的閾值比不買蘋果要大一些；還可以這樣想，吃完乳酪想來些清口的水果，閾值也可以設定大一些。

import
numpy as np
#建立100*5 mat
x = np.zeros((100, 5), dtype='
bool')
features = ["
bread
", "
milk
", "
cheese
", "
", "
bananas"]
for i in
range(x.shape[0]):
if np.random.random() < 0.3:
# 喜歡麵包
x[i][0] = 1
if np.random.random() < 0.6:
#喜歡牛奶
x[i][1] = 1
if np.random.random() < 0.2:
#喜歡乳酪
x[i][2] = 1
if np.random.random() < 0.25:
#喜歡蘋果
x[i][3] = 1
if np.random.random() < 0.5:
#喜歡香蕉
x[i][4] = 1
else
: 
#沒有購買麵包，那麼購買牛奶的可能性也就小一些
if np.random.random() < 0.4:
#喜歡牛奶
x[i][1] = 1
if np.random.random() < 0.2:
#喜歡乳酪
x[i][2] = 1
if np.random.random() < 0.3:
#喜歡蘋果
x[i][3] = 1
if np.random.random() < 0.5:
#喜歡香蕉
x[i][4] = 1
else
: 
if np.random.random() < 0.8:
#喜歡乳酪
x[i][2] = 1
if np.random.random() < 0.6:
#喜歡蘋果
x[i][3] = 1
if np.random.random() < 0.7:
#喜歡香蕉
x[i][4] = 1
if x[i].sum() ==0:
x[i][4] = 1 #
不考慮（單純逛超市的人）

得到資料集。下面開始生成沒種情況的比例：

from collections import

defaultdict

import numpy as np

dataset_filename = "affinity_dataset.txt"

x = np.loadtxt(dataset_filename)

n_samples, n_features = x.shape

# print(n_samples, n_features)


# 建立字典
valid_rules =defaultdict(int)
invalid_rules =defaultdict(int)
num_occurences =defaultdict(int)
confidence =defaultdict(float)
# 特徵值
features = ["
bread
", "
milk
", "
cheese
", "
", "
bananas"]
for sample in
x:# premise為前提
for premise in range(5):
if sample[premise]==0:continue
num_occurences[premise] += 1
# conclution為結論
for conclusion in
range(n_features):
if premise == conclusion:continue
if sample[conclusion] == 1:
# 有效規則字典
valid_rules[(premise, conclusion)] +=1
else
:# 無效規則
invalid_rules[(premise, conclusion)] += 1# 支援度
support =valid_rules
for premise,conclusion in
valid_rules.keys():
rule =(premise, conclusion)
# 計算置信度
confidence[rule] = valid_rules[rule]/num_occurences[premise]

下面構建乙個函式：輸入前提和結論，就能根據資料集計算出支援度和置信度。

def
print_rule(premise,conclusion, support, confidence,features):
premise_name =features[premise]
conclusion_name =features[conclusion]
print("
rule :if a persion buys they will also buy 
".format(premise_name, conclusion_name))
print("
-- support:
".format(support[(premise,conclusion)]))
print("
-- confidence:
".format(confidence[(premise,conclusion)]))

同時可以對置信度進行排序：

from operator import
itemgetter
sorted_confidence = sorted(confidence.items(),key=itemgetter(1), reverse=true)
for index in range(5):
print("
rule #{}
".format(index + 1))
premise,conclusion =sorted_confidence[index][0]
print_rule(premise, conclusion, support, confidence, features)

注意，這裡的confidence種的鍵再變化。不像平時的鍵，都是一致的。eg：

rows = [,,

rows_by_fname = sorted(rows, key=itemgetter('
fname
'))

對顧客的行為做出**後，一定程度上迎合了顧客的消費行為，銷量就會提公升一部分。

親和性分析

親和性反親和性汙點容忍

pod 的親和性，反親和性實驗

istio kiali 親和性排程

親和性分析

親和性 反親和性 汙點 容忍

pod 的親和性，反親和性 實驗

istio kiali 親和性排程

相關推薦

親和性反親和性汙點容忍

pod 的親和性，反親和性實驗