2020大資料資料處理綜合練習

@override

protected void map(longwritable key, text value, context context) throws ioexception, interruptedexception else if (line.length>10)

relatedids=newrelatedid.substring(0,newrelatedid.lastindexof(","));

}//處理整段資料

string data=value.tostring().replace(line[3],category);

string orelatedids=value.tostring().substring(value.tostring().indexof(line[9]));

//最終資料

string finaldata=data.replace(orelatedids,relatedids);

//context.write(key,new text(finaldata));

}}}

reducer：

public class reducer02 extends reducer}}

driver：

public class driver02 extends configured implements tool 
public static void main(string args) throws exception 
}

util：

public class util 
return categorys.substring(0,categorys.lastindexof(","));
}return category;}}

把預處理之後的資料進行入庫到hive中

roi:

create table video_user_ori( uploader string, videos string, friends string) row format delimited fields terminated by "," stored as textfile;

orc:

create table video_user_orc( uploader string, videos string, friends string) row format delimited fields terminated by "," stored as orc;

load data local inpath '/opt/user.txt' into table video_user_ori;

insert into table video_user_orc select * from video_user_ori;

對入庫之後的資料進行hivesql查詢操作

hive -e "select * from video.video_orc where rate=5 " > 5.txt

把hive分析出的資料儲存到hbase中

建立rate外部表的語句：

create external table rate( videoid string, uploader string, age string, category string, length string, views string, rate string, ratings string, comments string, relatedid string) row format delimited fields terminated by "\t" stored as textfile;

load data local inpath '/opt/5.txt' into table rate;

建立hive hbase對映表：

create table video.hbase_rate( videoid string, uploader string, age string, category string, length string, views string, rate string, ratings string, comments string, relatedid string) stored by 'org.apache.hadoop.hive.hbase.hbasestoragehandler' tblproperties("hbase.table.name" = "hbase_rate");

插入資料：

結果:

2020大資料資料處理綜合練習

大資料處理

大資料資料處理分析

大資料處理隨筆

2020大資料資料處理綜合練習

大資料處理

大資料資料處理分析

大資料處理隨筆

相關推薦