實驗1 將所有資料處理為矩陣

def load_data(ratingfile, testratio=0.1):
user_count = item_count = 0
ratings = 
for line in open(ratingfile):
arr = line.strip().split()
user_id = int(arr[0])
item_id = int(arr[1])
score = float(arr[2])
timestamp = long(arr[3])
user_count = max(user_count, user_id)
item_count = max(item_count, item_id) 
user_count += 1
item_count += 1
ratings = sorted(ratings, key=lambda x: x[3]) # sort by timestamp
test_count = int(len(ratings) * testratio)
count = 0
trainmatrix = sp.lil_matrix((user_count, item_count))
testratings = 
for rating in ratings:
if count < len(ratings) - test_count:
trainmatrix[rating[0], rating[1]] = 1
else:
count += 1
newusers = set()
newratings = 0
for u in xrange(user_count):
if trainmatrix.getrowview(u).sum() == 0:
newusers.add(u)
for rating in ratings:
if rating[0] in newusers:
newratings += 1
sys.stderr.write("data\t{}\n".format(ratingfile))
sys.stderr.write("#users\t{}, #newuser: {}\n".format(user_count, len(newusers)))
sys.stderr.write("#items\t{}\n".format(item_count))
sys.stderr.write(
"#ratings\t {} (train), {}(test), {}(#newtestratings)\n".format(
trainmatrix.sum(), len(testratings), newratings))
return trainmatrix, testratings

mark

lil_matrix: list of lists format是一種矩陣型別，使用lil_matrix和dok_matrix來高效的構建矩陣。

lil_matrix支援與numpy類似的基本的切片和索引等操作，coo_matrix也可以用來有效構建矩陣。

testratio=0.1 1/10的資料集作為訓練集，放在testratings中。

轉換矩陣 trainmatrix[rating[0], rating[1]] = 1，表示有過評分的使用者專案對為1，沒有的為0.【0開始的時候初始化了】

xrange是個迭代生成器，生成物件

getrowview(i)是lil_matrix矩陣的方法——

returns a view of the 『i』th row (without copying).【lil_matrix的官方文件】

標準錯誤輸出

密立根油滴實驗實驗資料處理

要求寫計算過程，其中資料可由此程式計算，其他可以隨便寫點過程 2333 include include include define pi 3.14159265359 定義圓周率的取值 define density 981 油滴的密度 define b 6.17e 6 粘滯係數修正常數 define...

python 資料處理 1

python 基礎各種資料型別的用途 1.字串 1 大小寫轉換例 pharse he is very beautiful print pharse.upper other djfsdf print other.lower 以上並沒有將變數永久改為大小寫模式，若想永久改變 pharse he ...

Python 資料處理（1）

記錄最近處理資料集常用的幾個操作。刪除行之後行號就不是連續的了，索引行號的時候不方便。這裡重新設定行號，並把原先的行號drop掉。df df.reset index drop true 統計，排序。df.colnames.value counts sort index loc 引用行列名稱。str....

實驗1 將所有資料處理為矩陣

密立根油滴實驗實驗資料處理

python 資料處理 1

Python 資料處理（1）

相關推薦