Thursday, February 12, 2009

SELECT INTO or LOOP?

I've always used the SELECT INTO syntax to retrieve data into variables. Lately though, I've seen the LOOP structure used.

Personally, I like the SELECT INTO, it seems clear and concise to me. I didn't want to just dismiss the LOOP offhand so I ran a couple of tests.

First I create my table with random strings:
CREATE TABLE t
AS
SELECT dbms_random.string( 'a', TRUNC( dbms_random.value( 5, 30 ) ) ) x
FROM dual
CONNECT BY level <= 10000000;
Yes, that's 10 Million rows. I just wanted to have a nice number for total time. Everything small returned a big fat zero.

I picked a string out of the table and performed the same test 5 times with each method searching for the same string.

SELECT INTO
DECLARE
l_start_time NUMBER;
l_end_time NUMBER;
l_total_time NUMBER;
l_dummy VARCHAR2(1);
BEGIN
l_start_time := dbms_utility.get_time;

d( l_start_time );

SELECT 'Y'
INTO l_dummy
FROM t
WHERE x = 'XFLCTdLXZjwlHBAqOgdddUCu';

l_end_time := dbms_utility.get_time;

d( l_end_time );
l_total_time := l_end_time - l_start_time;
d( 'Total Time: ' || l_total_time );
END;
/
Here are the results:
SQL>/
Total Time: 86

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.85
SQL>/
Total Time: 87

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.87
SQL>/
Total Time: 86

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.87
SQL>/
Total Time: 87

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.87
SQL>/
Total Time: 87

PL/SQL procedure successfully completed.
Not bad.

LOOP
DECLARE
l_start_time NUMBER;
l_end_time NUMBER;
l_total_time NUMBER;
l_dummy VARCHAR2(1);
BEGIN
l_start_time := dbms_utility.get_time;

d( l_start_time );

FOR i IN ( SELECT 'Y' y
FROM t
WHERE x = 'XFLCTdLXZjwlHBAqOgdddUCu' )
LOOP
l_dummy := i.y;
EXIT;
END LOOP;

l_end_time := dbms_utility.get_time;

d( l_end_time );
l_total_time := l_end_time - l_start_time;
d( 'Total Time: ' || l_total_time );
END;
/
And the results:
SQL>/
Total Time: 86

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.87
SQL>/
Total Time: 88

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.89
SQL>/
Total Time: 88

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.89
SQL>/
Total Time: 88

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.89
SQL>/
Total Time: 87

PL/SQL procedure successfully completed.
Neglibily slower. Nothing to write home about though.

Let's throw an index on there for fun.
CREATE INDEX t_idx
ON t( x );
SELECT INTO
SQL>DECLARE
2 l_start_time NUMBER;
3 l_end_time NUMBER;
4 l_total_time NUMBER;
5 l_dummy VARCHAR2(1);
6 BEGIN
7 l_start_time := dbms_utility.get_time;
8
9 SELECT 'Y'
10 INTO l_dummy
11 FROM t
12 WHERE x = 'XFLCTdLXZjwlHBAqOgdddUCu';
13
14 l_end_time := dbms_utility.get_time;
15
16 l_total_time := l_end_time - l_start_time;
17 d( 'Total Time: ' || l_total_time );
18 END;
19 /
Total Time: 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>/
Total Time: 0

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>/
Total Time: 0

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>/
Total Time: 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>/
Total Time: 0

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00

LOOP
SQL>DECLARE
2 l_start_time NUMBER;
3 l_end_time NUMBER;
4 l_total_time NUMBER;
5 l_dummy VARCHAR2(1);
6 BEGIN
7 l_start_time := dbms_utility.get_time;
8
9 FOR i IN ( SELECT 'Y' y
10 FROM t
11 WHERE x = 'XFLCTdLXZjwlHBAqOgdddUCu' )
12 LOOP
13 l_dummy := i.y;
14 EXIT;
15 END LOOP;
16
17 l_end_time := dbms_utility.get_time;
18
19 l_total_time := l_end_time - l_start_time;
20 d( 'Total Time: ' || l_total_time );
21 END;
22 /
Total Time: 1

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>/
Total Time: 0

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>/
Total Time: 0

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL>/
Total Time: 0

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SQL>/
Total Time: 0

PL/SQL procedure successfully completed.
OK, there doesn't seem to be much of a difference performance wise.

Then I got to thinking (look out!). What about errors? no_data_found or too_many_rows? With the SELECT INTO those are thrown, with the LOOP, they are not.

SELECT INTO, no_data_found
SQL>DECLARE
2 l_start_time NUMBER;
3 l_end_time NUMBER;
4 l_total_time NUMBER;
5 l_dummy VARCHAR2(1);
6 BEGIN
7 l_start_time := dbms_utility.get_time;
8
9 SELECT 'Y'
10 INTO l_dummy
11 FROM t
12 WHERE x = 'ORACLENERD';
13
14 l_end_time := dbms_utility.get_time;
15
16 l_total_time := l_end_time - l_start_time;
17 d( 'Total Time: ' || l_total_time );
18 END;
19 /
DECLARE
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at line 9
LOOP, no record found
SQL>DECLARE
2 l_start_time NUMBER;
3 l_end_time NUMBER;
4 l_total_time NUMBER;
5 l_dummy VARCHAR2(1);
6 BEGIN
7 l_start_time := dbms_utility.get_time;
8
9 FOR i IN ( SELECT 'Y' y
10 FROM t
11 WHERE x = 'ORACLENERD' )
12 LOOP
13 l_dummy := i.y;
14 EXIT;
15 END LOOP;
16
17 l_end_time := dbms_utility.get_time;
18
19 l_total_time := l_end_time - l_start_time;
20 d( 'Total Time: ' || l_total_time );
21 END;
22 /
Total Time: 0

PL/SQL procedure successfully completed.
SELECT INTO, too_many_rows
SQL>INSERT INTO t ( x ) VALUES ( 'ORACLENERD' );

1 row created.

Elapsed: 00:00:00.04
SQL>INSERT INTO t ( x ) VALUES ( 'ORACLENERD' );

1 row created.

Elapsed: 00:00:00.01
SQL>DECLARE
2 l_start_time NUMBER;
3 l_end_time NUMBER;
4 l_total_time NUMBER;
5 l_dummy VARCHAR2(1);
6 BEGIN
7 l_start_time := dbms_utility.get_time;
8
9 SELECT 'Y'
10 INTO l_dummy
11 FROM t
12 WHERE x = 'ORACLENERD';
13
14 l_end_time := dbms_utility.get_time;
15
16 l_total_time := l_end_time - l_start_time;
17 d( 'Total Time: ' || l_total_time );
18 END;
19 /
DECLARE
*
ERROR at line 1:
ORA-01422: exact fetch returns more than requested number of rows
ORA-06512: at line 9
If choosing between the 2 methods, performance is not necessarily a consideration. However, one method will throw errors and one will not. If you use the LOOP method, you'll have to add a check IF l_dummy IS NOT NULL THEN or something along those lines. If you use the SELECT INTO method, you can either next the statement in a BEGIN END block or let the exception propogate.

I prefer to let it complain loudly with SELECT INTO. I want to create the BEGIN END block knowing why I am doing it or catching no_data_found or too_many_rows.

One more thing about the LOOP method, what if you have more than one row that matches the criteria (input) you specify? You'll get a row but you can't be sure it's the row. That's not good.

I would recommend SELECT INTO vs the LOOP method. It's prettier (naturally) and it will complain loudly if you don't specify all of the proper predicates (input).

3 comments:

Bradd Piontek said...

Have you tried a test case where you define explicit cursors in the declare section? I'd be curious if there were any difference there.

My rule of thumb used to be, if you are expecting to return 1 or 0 rows, use SELECT INTO, for the reason's you mention below (exceptions being the main one), but use cursors for what they are meant for, looping through data. I'm not sure if this applies as much in the 10g/11g world.

oraclenerd said...

It would be but I doubt there would be much difference performance wise. But you give me an idea, what about Dynamic SQL? That should be significantly slower...

But that's getting away from my point I think. I contend that the SELECT INTO is a better programming style/method/whatever because of the exceptions thrown. You as a developer, should know every possible failure point in your code (unit testing you say?). There's the possibility that you could miss that exception and carry on like nothing happened.

What about the case of having multiple records? How will you know if you have the right one or if you've supplied the right input (predicates)? Imagine debugging that...it would look innocuous...it functions properly...there is no failure...but the downstream processing is not working as expected (it has the wrong row)...that's not fun.

Add debugging to my list of why you should use SELECT INTO vs LOOP

Tom said...

Just Bulk Collect :) LOL