Sunday, February 8, 2009

Design: Storing Data

Let's say you have the following data:
NAME          200812     200901     200902
--------- ---------- ---------- ----------
Cable 75 75 75
Groceries 250 225 300
Internet 50 50 50
Mortgage 1000 1000 1000
Phone 25 25 25
The table structure for that would look like the following:
CREATE TABLE t
(
name VARCHAR2(30),
"200812" NUMBER(6,0),
"200901" NUMBER(6,0),
"200902" NUMBER(6,0)
);
Is this the best way to store it though?

Are there times when you would want to store it this way?

What happens when March (200903) rolls around? Well, you'll have to add another column. Not so bad you say? But you'll have to do that same thing the following month, the month after that and so on. Now you have 5 years (60 months/columns) worth of data. Your table becomes a bit unwieldy and difficult to maintain. Not to mention the eyesore of a 60+ column (and ever growing) table.

What if you find a mistake in 200901, Groceries should have been $400. That's easy enough to update right?
UPDATE t 
SET "200901" = 400
WHERE name = 'Groceries';
That's OK for the occasional update I suppose, but what if you wanted an interface through PL/SQL? Each time you add a column you'd have to update the code as well. That sucks.

I suppose you could write some Dynamic SQL to generate the UPDATE statement each and every time, but Dynamic SQL isn't very easy to maintain.

Now, what if you needed to know the history for the table. What would you do?

Well, you could create a history table and a trigger (blech) to populate it on UPDATE and DELETE. Not so bad right?

You'd then be able to query both tables to find the history of a particular column. Your SQL (reports) will have to updated and may get pretty complicated as well.

But wait. You have to add a column for each month which now needs to propogate to:

1. Source Table
2. PL/SQL (unless of the Dynamic variety)
3. Trigger
4. History Table
5. Reports.

How do you make this easy?

I want to write this one time and just let it be. I don't want a history table (nor the accompanying trigger, blech). I don't want to rewrite my code or my SQL each and every time a new month comes.

The underlying model should handle most circumstances and your reports should just work.

First, I want to constrain my "name" list. I could use a CHECK constraint to do so, but I'd like to be able to add to it. Let's create a reference/lookup/crosswalk (not to be confused with an intersection) table.
CREATE TABLE name_types
(
namecode VARCHAR2(30)
CONSTRAINT pk_namecode PRIMARY KEY
);

INSERT INTO name_types ( namecode ) VALUES ( 'Groceries' );
INSERT INTO name_types ( namecode ) VALUES ( 'Internet' );
INSERT INTO name_types ( namecode ) VALUES ( 'Phone' );
INSERT INTO name_types ( namecode ) VALUES ( 'Cable' );
INSERT INTO name_types ( namecode ) VALUES ( 'Mortgage' );
Next comes my main table. I'll use NAME_TYPES.NAMECODE as a Foreign Key in my main table.
CREATE TABLE t
(
namecode
CONSTRAINT fk_namecode_t REFERENCES name_types( namecode )
CONSTRAINT nn_namecode_t NOT NULL,
date_of DATE DEFAULT SYSDATE
CONSTRAINT nn_dateof_t NOT NULL,
amount NUMBER(16,2) DEFAULT 0
CONSTRAINT nn_amount_t NOT NULL,
date_from DATE DEFAULT SYSDATE
CONSTRAINT nn_datefrom_t NOT NULL,
date_to DATE
);
I'm not worried about a Primary Key as this is for demonstration purposes only.

The DATE_TO field will typically be NULL but will allow me to maintain a history as mentioned above. If I no longer what a record to be displayed, I just "end date" it, or populate DATE_TO with the current date.

To keep it mildly short, I'll use the "Groceries" data for starters.
INSERT INTO t
( namecode,
date_of,
amount )
VALUES
( 'Groceries',
ADD_MONTHS( SYSDATE, -2 ),
250 );

INSERT INTO t
( namecode,
date_of,
amount )
VALUES
( 'Groceries',
ADD_MONTHS( SYSDATE, -1 ),
225 );

INSERT INTO t
( namecode,
date_of,
amount )
VALUES
( 'Groceries',
SYSDATE,
300 );

NAMECODE DATE_OF AMOUNT DATE_FROM DATE_TO
---------- --------- ---------- --------- ---------
Groceries 09-DEC-08 250 09-FEB-09
Groceries 09-JAN-09 225 09-FEB-09
Groceries 09-FEB-09 300 09-FEB-09
If I want to update something, I can write one procedure that should update record I want.
UPDATE t
SET amount = 200
WHERE namecode = p_namecode
AND TRUNC( date_of, 'MONTH' ) = p_dateof;
P_NAMECODE begin 'Groceries' in this case and P_DATEOF being the Year and Month (in DATE format) of the record you want to modify.

That won't satisfy our history requirement though. In order to do that, you'll need to UPDATE the DATE_TO column with the current date and then create a new record. It should look something like this:
UPDATE t
SET date_to = SYSDATE
WHERE TRUNC( date_of, 'MONTH' ) = TRUNC( p_dateof, 'MONTH' )
AND namecode = 'Groceries';

INSERT INTO t
( namecode,
dateof,
amount )
VALUES
( p_namecode,
p_dateof,
p_amount );
Now and queries you write would just need the WHERE date_to IS NULL predicate. If you need history, you just remove the predicate.
SELECT *
FROM
(
SELECT
namecode nc,
TO_CHAR( date_of, 'YYYYMM' ) d,
amount
FROM t
)
PIVOT
(
SUM( amount ) total
FOR d IN ( 200812 AS "200812", 200901 AS "200901", 200902 AS "200902" )
)
(Though I still haven't found a great way to have the columns automatically appear as new dates get added, I'll leave that for a future exercise.) The results look like this:
NC         200812_TOTAL 200901_TOTAL 200902_TOTAL
---------- ------------ ------------ ------------
Groceries 250 225 300


Now you have a easy to maintain, robust design. Hardly any maintenance is required. Queries as as simple as:

1 comment:

Anonymous said...

I'd say your first design violates the first normal form (1NF), since the columns can be considered repeated groups.
So this post is actually a good demonstration on why normalization is really important for good design and not just a hobby of anal architects.