pyspark 邏輯回歸

2022-05-06 20:30:10 字數 3865 閱讀 8813

是在整理檔案時, 翻到的, 感覺是好久以前的**了, 不過看了, 還是可以的. 起碼注釋還是蠻清晰的. 那時候我真的是妥妥的調包man....

#

邏輯回歸-標準化套路

from pyspark.ml.feature import

vectorassembler

import

pandas as pd

#1. 準備資料 - 樣本資料集

sample_dataset =[

(0,

"male

", 37, 10, "

no", 3, 18, 7, 4),

(0,

"female

", 27, 4, "

no", 4, 14, 6, 4),

(0,

"female

", 32, 15, "

yes", 1, 12, 1, 4),

(0,

"male

", 57, 15, "

yes", 5, 18, 6, 5),

(0,

"male

", 22, 0.75, "

no", 2, 17, 6, 3),

(0,

"female

", 32, 1.5, "

no", 2, 17, 5, 5),

(0,

"female

", 22, 0.75, "

no", 2, 12, 1, 3),

(0,

"male

", 57, 15, "

yes", 2, 14, 4, 4),

(0,

"female

", 32, 15, "

yes", 4, 16, 1, 2),

(0,

"male

", 22, 1.5, "

no", 4, 14, 4, 5),

(0,

"male

", 37, 15, "

yes", 2, 20, 7, 2),

(0,

"male

", 27, 4, "

yes", 4, 18, 6, 4),

(0,

"male

", 47, 15, "

yes", 5, 17, 6, 4),

(0,

"female

", 22, 1.5, "

no", 2, 17, 5, 4),

(0,

"female

", 27, 4, "

no", 4, 14, 5, 4),

(0,

"female

", 37, 15, "

yes", 1, 17, 5, 5),

(0,

"female

", 37, 15, "

yes", 2, 18, 4, 3),

(0,

"female

", 22, 0.75, "

no", 3, 16, 5, 4),

(0,

"female

", 22, 1.5, "

no", 2, 16, 5, 5),

(0,

"female

", 27, 10, "

yes", 2, 14, 1, 5),

(1, "

female

", 32, 15, "

yes", 3, 14, 3, 2),

(1, "

female

", 27, 7, "

yes", 4, 16, 1, 2),

(1, "

male

", 42, 15, "

yes", 3, 18, 6, 2),

(1, "

female

", 42, 15, "

yes", 2, 14, 3, 2),

(1, "

male

", 27, 7, "

yes", 2, 17, 5, 4),

(1, "

male

", 32, 10, "

yes", 4, 14, 4, 3),

(1, "

male

", 47, 15, "

yes", 3, 16, 4, 2),

(0,

"male

", 37, 4, "

yes", 2, 20, 6, 4)

]columns = ["

affairs

", "

gender

", "

age", "

label

", "

children

", "

religiousness

", "

education

", "

occupation

", "

rating"]

#pandas構建dataframe,方便

pdf = pd.dataframe(sample_dataset, columns=columns)

#2. 特徵選取:affairs為目標值,其餘為特徵值 - 這是工作中最麻煩的地方, 多張表, 資料清洗

df2 = df.select("

affairs

","age

", "

religiousness

", "

education

", "

occupation

", "

rating")

#3. 合併特徵-將多列特徵合併為一列"feature", 如果是離散資料, 需要先 onehot 再合併, 挺繁瑣的

#3.1 用於計算特徵向量的字段

colarray2 = ["

age", "

religiousness

", "

education

", "

occupation

", "

rating"]

#3.2 計算出特徵向量

df3 = vectorassembler().setinputcols(colarray2).setoutputcol("

features

").transform(df2)

#4. 劃分分為訓練集和測試集(隨機)

traindf, testdf = df3.randomsplit([0.8,0.2])

#print("訓練集:")

#traindf.show(10)

#print("測試集:")

#testdf.show(10)

#5. 訓練模型

from pyspark.ml.classification import

logisticregression

#5.1 建立邏輯回歸訓練器

lr =logisticregression()

#5.2 訓練模型

model = lr.setlabelcol("

affairs

").setfeaturescol("

features

").fit(traindf)

#5.3 **資料

model.transform(testdf).show()

#todo

#6. 評估, 交叉驗證, 儲存, 封裝.....

主要也是作為乙個歷史的筆記, 當然也作為乙個反例, 即如果不懂原理,來呼叫包的話, 你會發現, ml 其實是多麼的無聊, 至少從**套路上看這樣的.

機器學習 邏輯回歸 Python實現邏輯回歸

coding utf 8 author 蔚藍的天空tom import numpy as np import os import matplotlib.pyplot as plt from sklearn.datasets import make blobs global variable path...

邏輯回歸模型 SAS邏輯回歸模型訓練

邏輯回歸模型是金融信貸行業製作各類評分卡模型的核心,幾乎80 的機器學習 統計學習模型演算法都是邏輯回歸模型,按照邏輯美國金融公司總結的sas建模過程,大致總結如下 一般通用模型訓練過程 a 按照指定需求和模型要求製作driver資料集,包含欄位有user id,dep b 其中,空值賦預設值即 c...

線性回歸與邏輯回歸

cost functionj 12m i 1m h x i y i hypothesish x tx 梯度下降求解 為了最小化j j j 1m i 1m h x i y i x i j 每一次迭代更新 j j 1m i 1m h x i y i x i j 正規方程求解 最小二乘法 xtx 1x t...