Monday, February 2, 2009

How Not to Use Surrogate Keys

There have been plenty of discussions about the use of surrogate keys vs. natural keys. This is not about that.

For some reason lately I've been noticing surrogate keys...a lot. Specifically in analytic functions or in the ORDER BY clause.

Here's an example of it's use in an analytic function:
SELECT
id,--surrogate key
col1,
col2,
col3,
ROW_NUMBER() OVER ( PARTITION BY col1 ORDER BY id ) my_rownum
FROM some_table;
OK, maybe I should have said anywhere you see an ORDER BY clause. Shoot me.

Like I was saying, don't use them in the ORDER BY clause.

The natural assumption (yes I know about those), assuming (again, I know) it's done via a sequence, is to assume (man...) that the highest number was the last one created. Not necessarily.

What if you have a 2 node RAC Cluster?
CREATE TABLE t
(
id NUMBER,
created TIMESTAMP DEFAULT SYSTIMESTAMP
);

CREATE SEQUENCE t_seq
START WITH 100
CACHE 100
INCREMENT BY 1
NOCYCLE;

In my tnsnames.ora file, I created an entry for each node (as opposed to have each ADDRESS in one entry).

I connect to node 1 and run the following:
INSERT INTO t ( id ) VALUES ( t_seq.nextval );
I then connect to node 2
INSERT INTO t ( id ) VALUES ( t_seq.nextval );
COMMIT;
And back to node 1
INSERT INTO t ( id ) VALUES ( t_seq.nextval );
COMMIT;
A simple SELECT
SELECT * FROM t;

ID CREATED
------ --------------------------------
100 02-FEB-09 07.19.20.786144 PM
101 02-FEB-09 07.19.32.562516 PM
200 02-FEB-09 07.19.25.100219 PM
Now let's order it by CREATED:
SELECT * FROM t ORDER BY created;

ID CREATED
------ -------------------------------
100 02-FEB-09 07.19.20.786144 PM
200 02-FEB-09 07.19.25.100219 PM
101 02-FEB-09 07.19.32.562516 PM
Lesson? Don't rely on the sequence generated surrogate key to be in ORDER of creation.

Update: I've used the surrogate many times in these types of cases. I suppose the only "reliable" place to use the surrogate key would be in the child table (where the surrogate key is a foreign key and has "some" value). I've just now realized it and will move away from it going forward.

4 comments:

Bradd Piontek said...

You don't even need a 2-node RAC cluster for that, although it is the easiest to get that behavior. If your sequence is set up with NOORDER (the default) the is no guarantee on the order of the sequence. I'm not disputing your work, just adding on to it. If you want a 'key' that is sequential, with no gaps, don't use an Oracle sequence. That isn't what it was intended for.

oraclenerd said...

I'm sure you're not trying to be difficult. ;)

It's an old Tom Kyte addage..I've read those "gapless sequence" articles a number of times.

I think many people (myself included) can easily believe that they'll, at the very least, be in order...thanks for point that out though.

Gary Myers said...

Another point to note is the order in which records are created isn't necessarily the order in which the transactions commit and those records become visible.
If you have any application (eg replication) where a job 'remembers' the id or created/updated date of the last transaction it processed, then it can miss transactions that created records earlier but committed later. Much better to remember SCNs and pick up data with a later SCN.

oraclenerd said...

Good point Gary.

Would you say comparable to using ROWIDs in a non-discrete way? Or better said, trying to use ROWID as a way to identify a row at some undetermined point in the future?