MySQL 資料查重 去重的實現語句

2022-09-24 20:42:10 字數 1853 閱讀 6218

有乙個表user,字段分別有id、nick_name、password、email、phone。

一、單字段(nick_name)

查出所有有重覆記錄的所有記錄

select * from user where nick_name in (select nick_name from user group by nick_name h**ing count(nick_name)>1);

查出有重覆記錄的各個記錄組中id最大的記錄

select * from user where id in (select max(id) from usewww.cppcns.comr group by nick_name h**ing count(nick_name)>1);

查出多餘的記錄,不查出id最小的記錄

select * from user where nick_name in (select nick_name from user group by nick_name h**ing count(nick_name)>1) and id not in (select min(id) from user group by nick_name h**ing count(nick_name)>1);

刪除多餘的重覆記錄,只保留id最小的記錄

delete from user where nick_name in (select nick_name from (select nick_name from user group by nick_name h**ing count(nick_name)>1) as tmp1) and id not in (select id from (select min(id) from user group by nick_name h**ing count(nick_name)>1) as tmp2);

二、多欄位(nick_name,password)

查出所有有重覆記錄的記錄

select * from user where (nick_name,password) in (select nick_name,password from user group by nick_name,password where h**ing count(nick_name)>1);

查出有重覆記錄的各個記錄組中id最大的記錄

select * from user where id in (select max(id) from user group by nick_name,password where h**ing count(nick_name)>1);

查出各個重覆記錄組中多餘的記錄資料,不查出id最小的一條

select * from user where (nick_name,password) in (sele nick_name,password from user group by nick_name,password h**ing count(nick_name)>1) and id not in (select min(id) from user group by nick_name,password ha count(nick_name)>1);

刪除多餘的重覆記錄,只保留id最小的記錄

delete from user where (nick_name,password) in (select nic from (select nick_name,password from user group by nick_name,password h**ing count(nick_name)>1) as tmp1) and id not in (select id (select min(id) id from user group by nick_name,password h**ing count(nick_name)>1) as tmp2);

大資料查重去重方案及效能優化

最近做針對百萬級別的資料的去重工作,現抽空寫下筆記。做這個去重,是基於前同事的基礎上做改造,原來是用的simhash演算法做文字相似計算,上網查了下,simhash演算法是相對來說,在大資料領域比較受歡迎的查重演算法,話不多說,來一步步說下我的設計之路。一 先簡單介紹下simhash.傳統的hash...

大量資料查重

本來到軟工所是為了做作業和看書的,結果,很不務正業的被乙個問題吸引了,這個任務是 從檔案中讀入一系列格式化好的資料,如 int int int int float float 90900 1442373573 1486014884 0 0.0125 0.0949473 90900 144237357...

mysql去重欄位 mysql多字段去重,並計數

問 題 mysql版本5.5.42 有兩個表,表結構與資料如下 1 goods表 create table goods id int 10 unsigned not null,product varchar 180 collate utf8mb4 unicode ci not null,size v...