Spark實現排序

2021-08-28 12:18:30 字數 2687 閱讀 2831

question: 用spark對資料進行排序,首先按照顏值的從高到低進行排序,如果顏值相等,在根據年齡的公升序排序

1.user類繼承ordered,並且序列化

package cn.edu360.spark.day06

import org.apache.log4j.

import org.apache.spark.rdd.rdd

import org.apache.spark.

/** * 自定義排序

* created by zhangjingcun on 2018/9/27 17:13.

*/object customsort1 )

//排序

val sorted: rdd[user1] = userrdd.sortby(u => u)

//收集資料

val result: array[user1] = sorted.collect()

println(result.tobuffer)

sc.stop()

}}class user1 (val id:long, val name:string, val fv:int, val age:int) extends ordered[user1] with serializable else

} override def tostring: string =

}

2.user繼承sorted 沒有序列化,不需要new

package cn.edu360.spark.day06

import org.apache.log4j.

import org.apache.spark.

import org.apache.spark.rdd.rdd

/** * created by zhangjingcun on 2018/9/27 17:32.

*/object customsort2 )

//排序

val sorted: rdd[user2] = userrdd.sortby(u => u)

//收集資料

val result: array[user2] = sorted.collect()

println(result.tobuffer)

sc.stop()

}}//case 可以不使用new關鍵字

//不需要實現序列化

case class user2 (val id:long, val name:string, val fv:int, val age:int) extends ordered[user2] else

} override def tostring: string =

}

3.

package cn.edu360.spark.day06

import org.apache.log4j.

import org.apache.spark.

import org.apache.spark.rdd.rdd

/** * created by zhangjingcun on 2018/9/27 17:37.

*/object customsort3 )

//排序

val sorted: rdd[(long, string, int, int)] = userrdd.sortby(tp => user3(tp._1, tp._2, tp._3, tp._4))

//收集資料

val result: array[(long, string, int, int)] = sorted.collect()

println(result.tobuffer)

sc.stop()

}}//case 可以不使用new關鍵字

//不需要實現序列化

case class user3 (val id:long, val name:string, val fv:int, val age:int) extends ordered[user3] else

} override def tostring: string =

}

4.

package cn.edu360.spark.day06

import org.apache.log4j.

import org.apache.spark.

import org.apache.spark.rdd.rdd

/** * created by zhangjingcun on 2018/9/27 17:41.

*/object customsort4 )

//利用元祖的比較特點:先比較第乙個,如果不相等,按照第乙個屬性排序,在比較下個屬性

implicit val rules = ordering[(int, int)].on[(long, string, int, int)](t => (-t._3, t._4))

val sorted = tprdd.sortby(t => t)

//收集資料

val result: array[(long, string, int, int)] = sorted.collect()

println(result.tobuffer)

sc.stop()

}}

Spark實現排序

question 用spark對資料進行排序,首先按照顏值的從高到低進行排序,如果顏值相等,在根據年齡的公升序排序 1.user類繼承ordered,並且序列化 package cn.edu360.spark.day06 import org.apache.log4j.import org.apac...

spark的TimSort排序演算法實現

spark版本2.4.0。spark中的排序實現也是通過timsort類實現,實現具體方式與jdk略有區別。具體實現,在timsort類的sort 方法的sort 方法中。if nremaining min merge 當被排序的陣列長度小於32時,具體的排序流程分為兩步,首先通過countruna...

Spark實現wordCount(Scala版本)

廢話不多說直接上 初始化 val sc new sparkcontext conf val list sc.makerdd list lisa jennie ros jisoo black pink jisoo jennie lisa ros 這裡和scala寫差不多 都是先flatmap根據分隔符...