機器學習小案例篇關於RFM模型的小案例

import pandas as pd

in [75]:

trad_flow = pd.read_csv('d:\python\script\rfm_trad_flow.csv',encoding='gbk')  #編碼格式需要進行修改
trad_flow.head() #預設前五行

out[75]:

transid

cumid

time

amount

type_label

type

09407

10001

14jun09:17:58:34

199.0

正常normal

19625

10001

16jun09:15:09:13

369.0

正常normal

211837

10001

01jul09:14:50:36

369.0

正常normal

326629

10001

14dec09:18:05:32

359.0

正常normal

430850

10001

12apr10:13:02:20

399.0

正常normal

in [21]:

m = trad_flow.groupby(['cumid','type'])[['amount']].sum()

in [48]:

trains_m = pd.pivot_table(m,index='cumid',columns='type',values='amount')
trains_m.head()

out[48]:

type

normal

presented

special_offer

returned_goods

cumid

10001

3608.0

0.0420.0

-694.0

10002

1894.0

0.0nan

-242.0

10003

3503.0

0.0156.0

-224.0

10004

2979.0

0.0373.0

-40.0

10005

2368.0

0.0nan

-249.0

in [47]:

f = trad_flow.groupby(['cumid','type'])[['transid']].count()
f.head()

out[47]:

transid

cumid

type

10001

normal

15presented

8special_offer

2returned_goods

210002

normal

12in [46]:

r = trad_flow.groupby(['cumid','type'])[['time']].max()
r.head()

out[46]:

time

cumid

type

10001

normal

21jul09:09:31:26

presented

31mar10:20:29:48

special_offer

12oct09:10:59:13

returned_goods

10jul10:20:41:54

10002

normal

29jul09:19:21:41

in [53]:

trains_m['special_offer'] = trains_m['special_offer'].fillna(0)
trains_m['special_offer'].head()

out[53]:

cumid 10001 420.0 10002 0.0 10003 156.0 10004 373.0 10005 0.0

name: special_offer, dtype: float64

in [67]:

trains_m['spe_ratio'] = trains_m['special_offer']/(trains_m['special_offer']+trains_m['normal'])
trains_m['spe_ratio'].head()

out[67]:

cumid 10001 0.104270 10002 0.000000 10003 0.042635 10004 0.111277 10005 0.000000

name: spe_ratio, dtype: float64

in [68]:

m_rank = trains_m.sort_values('spe_ratio',ascending=false).head()
m_rank.head()

out[68]:

type

normal

presented

special_offer

returned_goods

spe_ratio

cumid

10151

765.0

0.0870.0

nan0.532110

40033

1206.0

0.0761.0

-848.0

0.386884

40236

1155.0

0.0691.0

-793.0

0.374323

30225

1475.0

0.0738.0

-301.0

0.333484

20068

1631.0

0.0731.0

-239.0

0.309483

in [74]:

pd.qcut(m_rank['spe_ratio'],4)

out[74]:

cumid
10151 (0.387, 0.532]
40033 (0.374, 0.387]
40236 (0.333, 0.374]
30225 (0.308, 0.333]
20068 (0.308, 0.333]
name: spe_ratio, dtype: category
categories (4, interval[float64]): [(0.308, 0.333] < (0.333, 0.374] < (0.374, 0.387] < (0.387, 0.532]]

相關：

【機器學習業務篇】基於rfm模型的使用者分群方法

【機器學習雜燴篇】詳解 pandas 透視表（pivot_table）

【機器學習雜燴篇】pandas fillna()函式詳解

【機器學習雜燴篇】pandas 排序sort_values

【機器學習雜燴篇】pandas中的qcut和cut

TensorFlow機器學習小案例（五）

1 損失函式表示用來值與已知答案進行比較差距，在神經訓練網路時，通過不斷改變神經所有引數，使損失函式不斷減少。從而提高準確率的神經網路模型。2 學習率進行更新引數，學習率大了，優化的引數不怎麼改變，學習率小，優化引數變化大，影響誤差。3 滑動平均增強滑動增強模型的泛化能力。4 正則化在損失...

機器學習情感分析小案例

對發帖情感進行分析。字段說明 announce id欄位代表使用者id，user name欄位代表使用者名稱，topic欄位代表發帖主題，body欄位代表發帖內容，post type欄位代表發帖話題是否與工作相關，sentiment欄位表明發帖情感色彩，ip欄位代表使用者ip位址。關於classif...

機器學習面筆試模型評估篇

p r圖，即以查全率做橫軸，查準率做縱軸的平面示意圖，通過p r曲線，來綜合判斷模型的效能。p r圖的畫法是先對對結果進行排序，排在前面的是學習器認為最可能是正例的樣本，排在最後面的是最不可能的樣本。按順序，依次將每乙個樣本劃分為正例進行就得到了多組查準率和查全率的值。roc空間是乙個以假陽性率...

機器學習小案例篇 關於RFM模型的小案例

TensorFlow機器學習小案例（五）

機器學習 情感分析小案例

機器學習面筆試 模型評估篇

相關推薦

機器學習小案例篇關於RFM模型的小案例

機器學習情感分析小案例

機器學習面筆試模型評估篇