Python LightFM搭建自己的推薦系統

資料介紹

lightfm在github上有許多的開源專案，但其中大多數使用的是國外開源的movies作為自己的資料，並未對lightfm中使用的資料格式做出說明，很多人看著專案中的步驟想要修改為自己的資料去實現時，往往無從下手。博主花了乙個星期的時間研究了一下這個框架，並想為正在一臉懵逼的小夥伴們提供一些幫助。

lightfm 是針對隱式和顯式反饋的許多流行的推薦演算法的 python 實現，包括 bpr 和 warp 排名損失的有效實現。它具備易用、快速（通過多執行緒模型估計）的特點，能夠產生高質量的結果。

官方文件：[

博主使用的是 win10+python3.5，lightfm安裝直接 pip install lightfm，如果報錯 error: microsoft visual c++ 14.0 is required 可以看一下這個文章(

在lightfm中，我主要介紹正反饋的資料處理

首先，構建乙個index為user、column為item的dataframe，**中的資料即user對item的評分（發生正反饋記為1，無行為記為0）

載入scipy中的csr_matrix模組(矩陣壓縮模組)

from scipy.sparse import csr_matrix

使用csr_matrix模組壓縮第一步產生的dataframe，得到乙個 sparse matrix 壓縮矩陣

data1 = csr_matrix(df)

#壓縮後的資料轉化為array格式

data1.toarray()

建立模型

model = lightfm(
)#lightfm(no_components=10, k=5, n=10, learning_schedule='adagrad', loss='logistic', learning_rate=0.05, rho=0.95, epsilon=1e-06, item_alpha=0.0, user_alpha=0.0, max_sampled=10, random_state=none)
#no_components:user_embedding和item_embedding的長度
#learning_schedule:學習率更新方式
#earning_schedule:選擇的損失函式（'logistic','bpr','warp','k-os'）
#item_alpha、user_alpha:item和user因子數

5.訓練模型

model.fit(data1)
#(self, interactions, user_features=none, item_features=none, sample_weight=none, epochs=1, num_threads=1, verbose=false)
#interactions:參與訓練的資料，格式為scipy.sparse.csr.csr_matrix
#user_feature,item_feature：user和item的特徵資料，格式為scipy.sparse.csr.csr_matrix
#sample_weight：權重,shape為[users,items]
#epochs:訓練的次數,越大花費的時間越多
#num_thresas:執行緒數
#verbose:是否展示過程

6、使用

model.get_user_representations(
)#獲得user_biases和user_features 
model.get_item_representations(
)#同上
model.predict_rank(data1)
#(test_interactions, train_interactions=none, item_features=none, user_features=none, num_threads=1, check_intersections=true)
#test_interactions:要**的矩陣
#獲得data1中每個user對每個item的**得分,返回乙個[users,items]大小的壓縮矩陣,toarray()可以展示
model.predict(np.
int(1)
,[1,
2,3]
)#(user_ids, item_ids, item_features=none, user_features=none, num_threads=1)
#對單個使用者對某個商品(列表形式傳入乙個或多個商品index)的得分**
model.user_embedding_gradients
#得到乙個shape為[users,no_components]大小的矩陣，代表的是()
model.item_embedding_gradients
#同上,shape is [items,no_components]

7、工具

a、模型評估

from lightfm.evaluation import auc_score
auc_score(model,data1)
.mean(
)#取auc均值，檢視得分情況

b、資料拆分

from lightfm.cross_validation import random_train_test_split
#

Python LightFM搭建自己的推薦系統

vs mysql搭建 VS mysql EF搭建

linux搭建lnnp linux搭建lnmp環境

dpdk環境搭建之命令搭建

Python LightFM搭建自己的推薦系統

vs mysql搭建 VS mysql EF搭建

linux搭建lnnp linux搭建lnmp環境

dpdk環境搭建之命令搭建

相關推薦