Thursday, July 30, 2009

Design: Persist or Derive?

The easy answer is, "It depends."

Derive:
to derive (third-person singular simple present derives, present participle deriving, simple past and past participle derived)

1. (transitive) To obtain or receive (something) from something else.
2. (transitive, logic) To deduce (a conclusion) by reasoning.
3. (transitive, linguistics) To find the derivation of (a word or phrase).
4. (transitive, chemistry) To create (a compound) from another by means of a reaction.
5. (intransitive) To originate or stem (from).
Persist(ence):

To persist data in the database world is to store/write it to a table.

When you begin to model a database application, you have to make decisions on whether or not to persist the data (store it in a table) or derive the data on the fly (using SQL, Views, etc.)

Age is a good candidate. Do you store the person's actual age or do you calculate it on the fly? If you store it, you'll need to create a process (daily, weekly, etc.) to ensure that it is accurate right? My birthday is on November 12th. Prior to that, I will be 37. After that, I will be 38. How do you know when to update it? On top of that, do you want to know how old I am in months? Days? Minutes? Or seconds? Do you store all of those?

Assuming you've captured the my date of birth (DOB), a simple query (better, a view) will calculate all those on the fly.
WITH dob
AS
(
SELECT TO_DATE( '11/12/1971', 'MM/DD/YYYY' ) m_dob
FROM dual
)
SELECT
m_dob dob,
TRUNC( ( SYSDATE - m_dob ) / 364.25 ) age_in_years,
TRUNC( ( SYSDATE - m_dob ) / ( 364.25 / 12 ) ) age_in_months,
TRUNC( SYSDATE - m_dob ) age_in_days,
TRUNC( ( SYSDATE - m_dob ) * ( 24 ) ) age_in_hours,
TRUNC( ( SYSDATE - m_dob ) * ( 24 * 60 ) ) age_in_mins,
TRUNC( ( SYSDATE - m_dob ) * ( 24 * 60 * 60 ) ) age_in_secs
FROM dob;

DOB: 11/12/1971
AGE_IN_YEARS: 37
AGE_IN_MONTHS: 453
AGE_IN_DAYS: 13,775
AGE_IN_HOURS: 330,617
AGE_IN_MINS: 19,837,026
AGE_IN_SECS: 1,190,221,597
So that was a pretty simplistic example.

Here's a harder one and the inspiration behind this post. Let's say you have 2 tables, PEOPLE and PEOPLE_NAMES.



A simpler design would be to store the names inline. This has a couple of benefits:
1. It's all in one table. No need to "complicate" things by joining to another table (yes, I know, that's what Oracle does)
2. Fairly straight-forward code to maintain that table. CREATE/UPDATE_PERSON would handle everything just fine.

OK, So #2 wasn't a great one. Hopefully you'll have a better explanation of benefits.

Heres' what it looks like:



So back to the first example, PEOPLE and PEOPLE_NAMES.
1. you can only have one name at a time (nick names is a different story).
2. you want to see the history and search on history (i.e. Maiden name).

Let me back up a little bit. I approach design from the reporting angle, not the functional angle. I believe the two go hand in hand though (naturally, you can't have reporting on a non-functional system). I think this comes from my background as an end-user. Without information, what good is the data?

Background over.

With Rule #1, you'll have only one record "active/current" at any given time. This will be maintained by the application layer. I've seen some interesting ways to maintain that rule with the use of triggers and/or materialized views, but I'm not going there. Of course I just realized that it's missing the start_date and end_date columns. Just assume they are there.

So how do you mimic the SELECT from the second example (i.e. names inline)? A View silly.
CREATE OR REPLACE
VIEW vw_people
AS
SELECT
p.peopleid,
p.dob,
p.gender,
pn.title,
pn.first_name,
pn.middle_name,
pn.last_name,
pn.suffix,
pn.printed_name,
pn.start_date,
pn.end_date
FROM
people p,
people_names pn
WHERE p.peopleid = pn.peopleid
AND pn.end_date IS NULL;
In your reports (user interfaces), you use the view as opposed to SELECTing directly from the table. VW_PEOPLE now resembles PEOPLE from the second example.

What does this have to do with Persist or Derive? Well, I forgot...wait, did I just completely screw up this example? I'm not really deriving the name, I'm storing it, just in a different table.

This post will definitely have a Part II.

Apologies for wasting your time.

Update
Part II can be found here. Hopefully it makes up for this one.

No comments: