Oracle半连接学习总结


半连接(semi-join)

 

在我们查看一个数据集中某些字段存在于另一个数据集合中的记录时,常常会用到in 或者 exists。在执行计划中会看到join semi。

 

在这里给出以下测试用的数据表结构以及模拟数据

drop table table_1 purge;

drop table table_2 purge;

 

create table table_1

as select

cast(rownum as int) a,

cast(rownum+10 as int) b,

cast(dbms_random.string('i',10) as varchar2(10)) c

from dual connect by level<=500000

 

create table table_2

as select

cast(rownum*2 as int) a,

cast(rownum*2+10 as int) b,

cast(dbms_random.string('i',10) as varchar2(10)) c

from dual connect by level<=1000

 

在table_2增加一条已有的数据

insert into table_2 select * from table_2 where a=20

 

create index idx_tab1_a on table_1(a);

create index idx_tab2_a on table_2(a);

 

analyze table table_1 compute statistics

for table

for all indexes

for all indexed columns

 

analyze table table_2 compute statistics

for table

for all indexes

for all indexed columns

 

先看看下面的语句是什么业务含义

select * from table_1 ,table_2 wheretable_1.a=table_2.a

即使找到符合条件的数据,本次迭代也不会停止在table_2中继续往下寻找下一条符合条件的数据,如果再次找到,则迭代继续返回满足条件的数据。

 

 

 

 

 

18

28

LYZDKBNGLN

18

28

SICWAOITLK

20

30

DQCETGYPWE

20

30

HEFBMTNBQL

20

30

DQCETGYPWE

20

30

HEFBMTNBQL

22

32

URPNGTEIBW

22

32

TQNIVPFQUP

 

 

 

 

 

 

select /*d*/ table_1.*

from table_1 ,table_2 where table_1.a=table_2.a

 

in exists的含义为

可以理解为在主数据集作迭代时,如果在副数据集中找到第一个符合条件的数据,即完成本条迭代的操作,

在业务上可以理解为

即可以理解为为

select

a.a,a.b,b.a

from table_1a, (select distinct a from table_2) b

where a.a=b.a --and b.a=20

 

可以清楚看到上面语句与下面语句返回的结果不同

select

a.a,a.b,b.a

from table_1a, (select a from table_2) b

where a.a=b.a --and b.a=20

 

我们来看看上面写法的执行计划

EXPLAIN PLAN FOR

select

a.a,a.b,b.a

from table_1 a, (select distinct a from table_2) b

where a.a=b.a and b.a=20;

 

SELECT * FROM TABLE(DBMS_XPLAN.display()); 

--------------------------------------------------------------------------------------

| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT        |            | 1001 | 39039 |   509   (2)| 00:00:07 |

|   1 |  VIEW                   | VM_NWVW_1  |  1001 | 39039 |   509   (2)| 00:00:07 |

|   2 |   HASH UNIQUE          |            |  1001 | 27027 |   509   (2)| 00:00:07 |

|*  3 |    HASH JOIN           |            |  1001 | 27027 |   508   (2)| 00:00:07 |

|   4 |     INDEX FAST FULL SCAN| IDX_TAB2_A | 1001 |  3003 |     3   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL   |TABLE_1    |   500K|    11M|   502   (1)| 00:00:07 |

--------------------------------------------------------------------------------------

 

 

 

虽然有业务上明确意义的写法,但是这种写法并不是高效率的写法,Oracle提供了IN ESISTS的解法来提高效率

 

EXPLAIN PLAN FOR

select

a.a,a.b

from table_1 a, (select distinct a from table_2) b

where a.a=b.a --and b.a=20;

 

SELECT * FROM TABLE(DBMS_XPLAN.display());

 

--------------------------------------------------------------------------------------

| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT        |           |  1001 | 26026 |   509   (2)| 00:00:07 |

|   1 |  VIEW                  | VM_NWVW_1  |  1001 | 26026 |   509   (2)| 00:00:07 |

|   2 |   HASH UNIQUE          |            |  1001 | 27027 |   509   (2)| 00:00:07 |

|*  3 |    HASH JOIN            |           |  1001 | 27027 |   508   (2)| 00:00:07 |

|   4 |     INDEX FAST FULL SCAN| IDX_TAB2_A | 1001 |  3003 |     3   (0)| 00:00:01 |

|   5 |     TABLE ACCESS FULL   |TABLE_1    |   500K|    11M|   502   (1)| 00:00:07 |

--------------------------------------------------------------------------------------

 

call     count       cpu   elapsed       disk      query   current        rows

------- ------  -------- -------------------- ---------- ---------- ----------

Parse        1      0.00      0.01          0          0          0           0

Execute      1      0.00      0.00          0          0          0           0

Fetch        1      0.00      0.00          0         18          0         100

------- ------  -------- -------------------- ---------- ---------- ----------

total        3      0.00      0.01          0         18          0         100

 

 

 

EXPLAIN PLAN FOR

select

a.a,a.b

from table_1 a

where a.a in (select a from table_2);

 

SELECT * FROM TABLE(DBMS_XPLAN.display());

 

------------------------------------------------------------------------------------

| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |            | 1000 | 20000 |   508   (2)| 00:00:07 |

|*  1 |  HASH JOIN RIGHT SEMI |           |  1000 | 20000 |   508   (2)| 00:00:07 |

|   2 |   INDEX FAST FULL SCAN| IDX_TAB2_A | 1001 |  3003 |     3   (0)| 00:00:01 |

|   3 |   TABLE ACCESS FULL   |TABLE_1    |   500K|  8300K|   502   (1)| 00:00:07 |

------------------------------------------------------------------------------------

 

call     count       cpu   elapsed       disk      query   current        rows

------- ------  -------- -------------------- ---------- ---------- ----------

Parse        1      0.00      0.12          0          0          0           0

Execute      1      0.00      0.00          0          0          0           0

Fetch        1      0.00      0.00          0         10          0         100

------- ------  -------- -------------------- ---------- ---------- ----------

total        3      0.00      0.12          0         10          0         100

 

 

EXPLAIN PLAN FOR

select

a.a,a.b

from table_1 a

where exists (select null from table_2 b where a.a=b.a);

SELECT * FROM TABLE(DBMS_XPLAN.display());

------------------------------------------------------------------------------------

| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |

------------------------------------------------------------------------------------

|   0 | SELECT STATEMENT      |            | 1000 | 20000 |   508   (2)| 00:00:07 |

|*  1 |  HASH JOIN RIGHT SEMI |           |  1000 | 20000 |   508   (2)| 00:00:07 |

|   2 |   INDEX FAST FULL SCAN| IDX_TAB2_A | 1001 |  3003 |     3   (0)| 00:00:01 |

|   3 |   TABLE ACCESS FULL   |TABLE_1    |   500K|  8300K|   502   (1)| 00:00:07 |

------------------------------------------------------------------------------------

 

call     count       cpu   elapsed       disk      query   current        rows

------- ------  -------- -------------------- ---------- ---------- ----------

Parse        1      0.00      0.00          0          0          0           0

Execute      1      0.00      0.00          0          0          0           0

Fetch        1      0.00      0.00          0         10          0         100

------- ------  -------- -------------------- ---------- ---------- ----------

total        3      0.00      0.00          0         10          0         100

 

 

将表table_1数据增加到50万条,执行计划与跟踪结果表明,in 与 exists的写法一样的效率。

  • 1
  • 2
  • 下一页

相关内容