Tuesday, September 8, 2009

Views: Complex Join Use Primary Keys?

Views have been on my mind quite a bit lately.

Last night I began to wonder if it makes a difference which key you use in the view. Logically, I thought, it would make a difference.

Here's my create scripts:
CREATE TABLE t
(
my_id NUMBER(12)
CONSTRAINT pk_myid PRIMARY KEY
);

INSERT INTO t( my_id )
SELECT rownum
FROM dual
CONNECT BY LEVEL <= 1000000;

CREATE TABLE t_child
AS
SELECT rownum child_id, my_id
FROM t;

ALTER TABLE t_child
ADD CONSTRAINT pk_childid
PRIMARY KEY ( child_id );

ALTER TABLE t_child
ADD CONSTRAINT fk_myid_tchild
FOREIGN KEY ( my_id )
REFERENCES t( my_id );

CREATE INDEX idx_myid_tchild
ON t_child( my_id );

CREATE TABLE t_child_2
AS
SELECT rownum child_id_2, child_id
FROM t_child;

ALTER TABLE t_child_2
ADD CONSTRAINT pk_childid2
PRIMARY KEY ( child_id_2 );

ALTER TABLE t_child_2
ADD CONSTRAINT fk_childid_tchild2
FOREIGN KEY ( child_id )
REFERENCES t_child( child_id );

CREATE INDEX idx_childid_tchild2
ON t_child_2( child_id );
So I wanted to see if Oracle took a different path depending on how I built the view.
CREATE OR REPLACE 
VIEW vw_test
AS
SELECT
tc.my_id,--note that isn't the PK from T
t2.child_id--again, not the PK from T_CHILD
FROM
t,
t_child tc,
t_child_2 t2
WHERE t.my_id = tc.my_id
AND tc.child_id = t2.child_id;
So now I run a couple of tests to see what happens when I SELECT on those columns (reminder, those are not the primary keys, they are indexed foreign keys).
CJUSTICE@TESTING>EXPLAIN PLAN FOR
2 SELECT *
3 FROM vw_test
4 WHERE my_id = 1;

Explained.

Elapsed: 00:00:00.00
CJUSTICE@TESTING>
CJUSTICE@TESTING>SELECT * FROM TABLE( dbms_xplan.display );

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 1671340153

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 14 | 6 (0)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| T_CHILD | 1 | 9 | 4 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IDX_MYID_TCHILD | 1 | | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_CHILDID_TCHILD2 | 1 | 5 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TC"."MY_ID"=1)
filter("TC"."MY_ID" IS NOT NULL)
4 - access("TC"."CHILD_ID"="T2"."CHILD_ID")
OK, so it did what I expected, it failed to get the record based on the primary key. I'll do it again, with the same construct as the view, but using the PK from T.
CJUSTICE@TESTING>EXPLAIN PLAN FOR
2 SELECT
3 tc.my_id,
4 t2.child_id
5 FROM
6 t,
7 t_child tc,
8 t_child_2 t2
9 WHERE t.my_id = tc.my_id
10 AND tc.child_id = t2.child_id
11 AND t.my_id = 1;

Explained.

Elapsed: 00:00:00.01
CJUSTICE@TESTING>
CJUSTICE@TESTING>@explain

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 4007286110

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 7 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 18 | 7 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 13 | 5 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_MYID | 1 | 4 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T_CHILD | 1 | 9 | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IDX_MYID_TCHILD | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IDX_CHILDID_TCHILD2 | 1 | 5 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T"."MY_ID"=1)
5 - access("TC"."MY_ID"=1)
6 - access("TC"."CHILD_ID"="T2"."CHILD_ID")
Good. Now I'll recreate the view using the primary key and see if we get the same result.
CREATE OR REPLACE
VIEW vw_test
AS
SELECT
t.my_id,
t2.child_id
FROM
t,
t_child tc,
t_child_2 t2
WHERE t.my_id = tc.my_id
AND tc.child_id = t2.child_id;

CJUSTICE@TESTING>EXPLAIN PLAN FOR
2 SELECT *
3 FROM vw_test
4 WHERE my_id = 1;

Explained.

Elapsed: 00:00:00.03
CJUSTICE@TESTING>@explain

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 4007286110

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 7 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 18 | 7 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 13 | 5 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_MYID | 1 | 4 | 2 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| T_CHILD | 1 | 9 | 3 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IDX_MYID_TCHILD | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IDX_CHILDID_TCHILD2 | 1 | 5 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T"."MY_ID"=1)
5 - access("TC"."MY_ID"=1)
6 - access("TC"."CHILD_ID"="T2"."CHILD_ID")
Now I'll try the same test using the FK on T_CHILD_2, CHILD_ID. No need to change the view as it's already there.
CJUSTICE@TESTING>EXPLAIN PLAN FOR
2 SELECT *
3 FROM vw_test
4 WHERE child_id = 1;

Explained.

Elapsed: 00:00:00.01
CJUSTICE@TESTING>@EXPLAIN

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 474290160

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 6 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 18 | 6 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 1 | 13 | 4 (0)| 00:00:01 |
| 3 | TABLE ACCESS BY INDEX ROWID| T_CHILD | 1 | 9 | 3 (0)| 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | PK_CHILDID | 1 | | 2 (0)| 00:00:01 |
|* 5 | INDEX UNIQUE SCAN | PK_MYID | 989K| 3863K| 1 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | IDX_CHILDID_TCHILD2 | 1 | 5 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("TC"."CHILD_ID"=1)
5 - access("T"."MY_ID"="TC"."MY_ID")
6 - access("T2"."CHILD_ID"=1)
And now using the PK
CJUSTICE@TESTING>EXPLAIN PLAN FOR
2 SELECT
3 tc.my_id,
4 t2.child_id
5 FROM
6 t,
7 t_child tc,
8 t_child_2 t2
9 WHERE t.my_id = tc.my_id
10 AND tc.child_id = t2.child_id
11 AND tc.child_id = 1;

Explained.

Elapsed: 00:00:00.01
CJUSTICE@TESTING>
CJUSTICE@TESTING>@explain

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 3182888138

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 5 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | 1 | 14 | 5 (0)| 00:00:01 |
|* 2 | TABLE ACCESS BY INDEX ROWID| T_CHILD | 1 | 9 | 3 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | PK_CHILDID | 1 | | 2 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IDX_CHILDID_TCHILD2 | 1 | 5 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("TC"."MY_ID" IS NOT NULL)
3 - access("TC"."CHILD_ID"=1)
4 - access("T2"."CHILD_ID"=1)
Looks like a much better path when I use the PK in the view definition. Note to self, if building complex views, don't use the Foreign Key column, use the Primary Key column.

Admittedly, I can't explain all the "magic" behind it, I'm hoping someone out there could help to explain. Logically, it makes sense as the Primary Key allows you the fastest access to a specific record (discounting the rowid).

7 comments:

Gary Myers said...

"the Primary Key allows you the fastest access to a specific record (discounting the rowid)."
Not always. Richard Foote has some great stuff on indexes.
Firstly, a primary key is a constraint, not an index, though it would need to be supported by a unique index.
Generally a unique index would be faster than a non-unique index. But a smaller index (less levels) could be faster than a larger one.
So it is possible for a smaller non-unique index to be faster than a larger unique one.
If you've got an indexed nullable field, and 99.999% of records are null, then it could be really small (even a single block) and so be REALLY fast.

Narendra said...

Gary,
Are you sure about this?
Firstly, a primary key is a constraint, not an index, though it would need to be supported by a unique index
If I am not mistaken, one can create a primary key constraint based on a non-unique index.

Chet,
That is interesting observation. Thanks.

Narendra said...

Chet,

Reading your post again (pardon me...I don't get things easily...:))I am not sure about your conclusion
Looks like a much better path when I use the PK in the view definition. Note to self, if building complex views, don't use the Foreign Key column, use the Primary Key column.

In the examples that you provided, why do you think including PK results in "better" plan? In fact, including the PK has added more steps to the plan. For e.g. in the view definition

CREATE OR REPLACE
VIEW vw_test
AS
SELECT
tc.my_id,--note that isn't the PK from T
t2.child_id--again, not the PK from T_CHILD
FROM
t,
t_child tc,
t_child_2 t2
WHERE t.my_id = tc.my_id
AND tc.child_id = t2.child_id

I didn't get why table T needs to be referenced? Am I missing the point?

DomBrooks said...
This comment has been removed by the author.
DomBrooks said...

The point, old boy, is, as Narenda points out, that in your initial view the optimizer recognises that although "T" is in your view definition, it has no need to go anywhere near it to satisify the query.

The foreign key constraints mean that, if you do not require anything to be selected, it does not need to the parent tables to validate the join condition.

Remove the foreign keys and you should see the plan that you think you want ;)

DomBrooks said...

If you do drop the constraints to see the difference on the plan, after that recreate them as ENABLE NOVALIDATE and see how this means that the optimizer can't then eliminate the join to "T".

Gary Myers said...

"If I am not mistaken, one can create a primary key constraint based on a non-unique index."
Possibly but the primary key must be unique so you might as well use a unique index. I can understand there may be times you'd use a non-unique index to support a unique constraint, but I'd only expect to see that if the constraint was deferrable or not validated which I wouldn't expect from a PK.