Join導致冗餘資料引起慢SQL

業務過程中碰到多個join引起慢sql問題，資料量不大，但查詢很慢，搜到一片blog，參考解決。

業務過程不記錄，以blog內容重現：

原sql：

select
distinct
abc.pro_col1, abc.col3 
from
t0 p 
inner
join
t1 abc 
on p.id=
abc.par_col2 
inner
join
t2 s 
on s.col3=
abc.col3 
inner
join
t3 po 
on po.id=
s.col4 
where p.state=
2and po.state=
3order
by abc.pro_col1, abc.col3;

以上sql同：

select
select
distinct
abc.pro_col1, abc.col3 
from
t0 p, t1 abc, t2 s, t3 po 
where p.id=
abc.par_col2 
and s.col3=
abc.col3 
and po.id=
s.col4
and p.state=
2and po.state=
3order
by abc.pro_col1, abc.col3;

分析優化：

從語義來看，這條sql是在經過幾個join後取其中乙個表的兩個欄位的唯一值。

但是每一次關聯，都可能產生冗餘的值，所以導致了結果集越來越龐大。

修改建議，每一次join都輸出唯一值，減少冗餘。即多次join導致查詢結果集越來越大（笛卡兒積），可以把過濾條件放在前面。

select
distinct pro_col1, col3 from
( 
select
distinct t1.pro_col1, t1.col3, s.col4 from
( 
select
distinct abc.pro_col1, abc.col3 from
t1 abc 
inner
join
t0 p 
on (p.id = abc.par_col2 and p.state=
2) 
) t1 
inner
join
t2 s 
on (s.col3 =
t1.col3) 
) t2 
inner
join
t3 po 
on (po.id = t2.col4 and po.state=
3) 
order
by t2.pro_col1, t2.col3 ;

以下例項：

postgres=# create
table rt1(id int, info text
); 
create
table
postgres
=# create
table rt2(id int, info text
); 
create
table
postgres
=# create
table rt3(id int, info text
); 
create
table
postgres
=# create
table rt4(id int, info text
); 
create
table
postgres
=# insert
into rt1 select generate_series(1,1000),'
test
'; 
insert
01000
postgres
=# insert
into rt2 select
1,'test
'from generate_series(1,1000
); 
insert
01000
postgres
=# insert
into rt3 select
1,'test
'from generate_series(1,1000
); 
insert
01000
postgres
=# insert
into rt4 select
1,'test
'from generate_series(1,1000
); 
insert
01000

對比：

優化後查詢：

從執行時間可以看到，優化後的速度何止是快。

mysql資料冗餘 MySQL冗餘資料的三種方案

一，為什麼要冗餘資料網際網路資料量很大的業務場景，往往資料庫需要進行水平切分來降低單庫資料量。水平切分會有乙個patition key，通過patition key的查詢能夠直接定位到庫，但是非patition key上的查詢可能就需要掃瞄多個庫了。此時常見的架構設計方案，是使用資料冗餘這種反正規...

HDFS 冗餘資料塊的自動刪除

在日常維護hadoop集群的過程中發現這樣一種情況某個節點由於網路故障或者datanode程序死亡，被namenode判定為死亡，hdfs馬上自動開始資料塊的容錯拷貝當該節點重新新增到集群中時，由於該節點上的資料其實並沒有損壞，所以造成了hdfs上某些block的備份數超過了設定的備份數。通過觀...

Join導致冗餘資料引起慢SQL

mysql資料冗餘 MySQL冗餘資料的三種方案

HDFS 冗餘資料塊的自動刪除

HDFS冗餘資料塊的自動刪除

相關推薦