B 樹索引的使用

並不是在所有的查詢條件下出現的列都需要新增索引。對於什麼時候新增b+樹索引，我的經驗是訪問表中很少一部分時，使用b+樹索引才有意義。對於性別字段，地區字段，型別字段，它們可取值的範圍很小，即低選著性。如：

select * from student where *** = 'm'

對於性別，可取值的範圍只有'm','f'。對上述sql語句得到的結果可能是該錶的50%的資料，這時新增b+樹索引時完全沒有必要的。相反，如果某個欄位的取值範圍很廣，幾乎沒有重複，即高選擇性，即此時使用b+樹索引時做合適的，例如姓名字段，基本上在乙個應用中都不允許重名的出現。

因此，當訪問高選擇性欄位並從表中取出很少一部分時，對這個字段新增b+樹索引是非常有必要的。但是如果出現了訪問欄位是高選擇性的，但是取出的行資料占用表中大部分的資料時，這時mysql資料庫就不會使用b+樹索引了，我們先來看乙個例子：

mysql> show index from info\g;

*************************** 1. row ***************************

table: info

non_unique: 0

key_name: primary

seq_in_index: 1

column_name: id

collation: a

cardinality: 356639

sub_part: null

packed: null

null:

index_type: btree

comment:

index_comment:

*************************** 2. row ***************************

table: info

non_unique: 1

key_name: index_link_family

seq_in_index: 1

column_name: link_family

collation: a

cardinality: 9385

sub_part: 255

packed: null

null: yes

index_type: btree

comment:

index_comment:

*************************** 3. row ***************************

table: info

non_unique: 1

key_name: index_date

seq_in_index: 1

column_name: date

collation: a

cardinality: 356639

sub_part: null

packed: null

null:

index_type: btree

comment:

index_comment:

表info大約有50萬行資料。info表上的date欄位，該字段是日期型別，欄位上有乙個index_date的非唯一索引。我們來看下面兩條sql的執行：

mysql> explain select * from info where date = '2006-07-26 15:56:01'\g;

*************************** 1. row ***************************

id: 1

select_type: ******

table: info

type: ref

possible_keys: index_date

key: index_date

key_len: 8

ref: const

rows: 2

extra:

1 row in set (0.00 sec)

error:

no query specified

可以看到使用了index_date這個索引，這也符合我們前面提到的高選擇性，選取表中很少行的原則。但是如果執行下面這條語句：

mysql> explain select * from info where date > '2006-07-26 15:56:01'\g;

*************************** 1. row ***************************

id: 1

select_type: ******

table: info

type: all

possible_keys: index_date

key: null

key_len: null

ref: null

rows: 356639

extra: using where

1 row in set (0.00 sec)

可以看到possible_keys依然是index_date，但是實際優化器使用的索引key顯示的是null。為什麼？因為這不符合我們前面說的原則，雖然date這個欄位的值是高選擇性的，但是我們取出的行占用了表中很大一部分。

mysql> select @a:=count(id) from info where date > '2006-07-26 15:56:01';

+---------------+

| @a:=count(id) |

+---------------+

| 452549 |

+---------------+

1 row in set (0.18 sec)

mysql> select @b:=count(id) from info ;

+---------------+

| @b:=count(id) |

+---------------+

| 452554 |

+---------------+

1 row in set (0.11 sec)

mysql> select @a/@b;

+--------+

| @a/@b |

+--------+

| 1.0000 |

+--------+

1 row in set (0.00 sec)

可以看到我們將取出行的數大概是表的100%的行，因此優化器沒有選擇使用索引。mysql資料庫的優化器會通過explain的rows欄位預估查詢可能得到的行，如果大於某乙個值，則b+樹會選擇全表的掃瞄。至於這個值，根據我的經驗一般在20%。即當取出的資料量超過表中資料的20%，優化器就不會使用索引，而是進行全表的掃表。

但是預估的返回行數的值是不準確的，可以看到優化器判斷日期小於2006-07-26的行為356639，而實際的是452549 。

有時優化器的選擇並不完全是正確的，有時你更應該相信自己的判斷（可以通過force index（index_name）來執行判斷兩條語句執行的時間差別）。

B 樹索引的使用

索引之B樹 B 樹 B 樹 B 樹

B樹與B 樹索引

B樹 B 樹及索引

B 樹索引的使用

索引之B樹 B 樹 B 樹 B 樹

B樹與B 樹索引

B樹 B 樹及索引

相關推薦