scala dataframe udf 函式總結

def cosinedistence(ve1: vetor, ve2:vector): double = .sum
var x1sum = v1.map(x1 =>math.pow(x1, 2)).sum
x1sum = math.pow(x1sum, 1.0/2)
var x2sum = v2.map(x2 => math.pow(x2,2)).sum
x2sum = math.pow(x2sum, 1.0/2)
distance = x1x2 / (x1sum * x2sum)
distance}
val udf_consinedistance = udf(consinedistence _）
val outcomes = output.limit(1).select($"feature" as "one").crossjoin(output)
val outcomes = outcomes.withcolumn("test_result", udf_consinedistance($"one",$"feature"))

示例二：特定詞頻統計

四個問題：

1.如何對udf函式設定多個引數，使用curring 函式法則進行處理，同時一定小心書寫格式，需要在等號左邊新增上自定義引數

2.函式輸出多個值，需要在函式定義處定義好輸出，輸出值應該是對應的

3.使用option[t]進行函式設計，防止程式執行崩潰，應該正確的使用map

4.對dataframe進行處理的時候需要注意udf的使用規則，這裡面只有乙個輸入列

5.在dataframe 使用filter判斷是否相等，應用「===」

6.一定注意split(".")和split('.』)

def get_set(num: int) :(set[string], set[string])=_line").getlines().tolist
val set1 = input(0).split(",").map(_.trim).toset
val set2 = input(1).split(",").map(_.trim).toset
(set1,set2)
}(set1,set2) = get_set(1)
def keywordscount(set1: set[string],set2: set[string])(str :string):option[int]=
}def countall(set1: set[string], set2: set[string])(str : string) : option[int] =
}count 
}}def udf_keywordcount(set1: set[string], set2:set[string]) = udf(countall(set1, sert2) _)
val df_final = df.withcolumn("count1",udf_keywordcount(set1,set2)($"descrip"))
df_final.filter(df_final("count1")===1).select("count1").count().toint

assert 函式用法總

assert巨集的原型定義在中，其作用是如果它的條件返回錯誤，則終止程式執行，原型定義 include void assert int expression assert的作用是現計算表示式 expression 如果其值為假即為0 那麼它先向stderr列印一條出錯資訊，然後通過呼叫 abort...

undistortPoints（）函式用法總結

函式呼叫 c void undistortpoints inputarray src,outputarray dst,inputarray cameramatrix,inputarray distcoeffs,inputarrayr noarray inputarray p noarray 引數說明...

MFC常用函式（總）for Myse

boolcopyfile lpctstr lpexistingfilename,pointer to name of an existing file lpctstr lpnewfilename,pointer to filename to copy to bool bfailifexists fl...

scala dataframe udf 函式總結

assert 函式用法總

undistortPoints（）函式用法總結

MFC常用函式（總）for Myse

相關推薦