Tuesday, March 17, 2009

Capturing Record History

Somehow I ended up on StackOverflow tonight answering this question on "Database history for client usage."

The Question
I'm trying to figure out what would be the best way to have a history on a database, to track any Insert/Delete/Update that is done. The history data will need to be coded into the front-end since it will be used by the users. Creating "history tables" (a copy of each table used to store history) is not a good way to do this, since the data is spread across multiple tables.

At this point in time, my best idea is to create a few History tables, which the tables would reflect the output I want to show to the users. Whenever a change is made to specific tables, I would update this history table with the data as well.

I'm trying to figure out what the best way to go about would be. Any suggestions will be appreciated.

I am using Oracle + VB.NET
My Answer
Personally, I would stay away from triggers. They can be a nightmare when it comes to debugging and not necessarily the best if you are looking to scale out.

If you are using an PL/SQL API to do the INSERT/UPDATE/DELETEs you could manage this in a simple shift in design without the need (up front) for history tables.

All you need are 2 extra columns, DATE_FROM and DATE_THRU. When a record is INSERTed, the DATE_THRU is left NULL. If that record is UPDATEd or DELETEd, just "end date" the record by making DATE_THRU the current date/time (SYSDATE). Showing the history is as simple as selecting from the table, the one record where DATE_THRU is NULL will be your current or active record.

Now if you expect a high volume of changes, writing off the old record to a history table would be preferable, but I still wouldn't manage it with triggers, I'd do it with the API.
The first answer was the suggestion of a history table along with a trigger. I wanted to dive into it a little further here.


I don't particularly care for triggers as they can "hide" actions from you. But the answer IS valid, it's the easy way to go in my opinion.

As stated above, I prefer one table, with the DATE_FROM (or create date) and the DATE_THRU. DATE_THRU is null until that record has been changed in some way. If the record has been updated, the old one will get a DATE_THRU of the current date/time and a new record will be INSERTed. If DELETEd, just the UPDATE of DATE_THRU will be performed.


I tend to think of addresses when thinking of something like this. An address is created on day 1. On day 2, the person comes back and changes their zip code. There are now 2 records in the table associatied with that customer and only one that is active (Day 2's).

Like I said in my answer though, if this is a high volume table with lots of changes to be expected, you may have to rethink this (I still wouldn't go with the trigger approach). Either keep my method and have a retention policy for records that have been end dated (DATE_THRU NOT NULL) and write them off somewhere else or go with the history table, but write the records via the PL/SQL API.

You also have to ask yourself if you really need to keep track of changes. Does it really matter? Can this be handled in the design another way? Can it be handled by not letting certain records be UPDATEd or DELETEd?

5 comments:

Narendra said...

Can't AUDIT command be one way to achive this with minimal code to generate history? Well, an obvious advantage is we will be using a built-in feature (which will be tried and tested).

oraclenerd said...

Narendra,

I think there are certain things that are application related that may or may not need to be displayed to the end-user. I don't think Auditing would necessarily be the best way to do this.

But you're right, why build something that Oracle has already built. Leverage the heck out of your database.

John T said...

From the original question - "Creating "history tables" (a copy of each table used to store history) is not a good way to do this, since the data is spread across multiple tables."

This assumption is plain wrong. Spreading your data is exactly what you want to do. Never forget about the basics of i/o. Disk reads continue to be the slowest operation within a computer system. i/o 101 stuff - put your tables on different heads, your indexes on different heads, your redo on different heads, etc. A lot has changed since the days where you had to stripe the local disk, but the principle is still the same. Spreading the load out to different tables will improve scalability & give you options down the line.

Oracle auditing is a good option, but since you are using .NET, I'm assuming you are using some type of connection pool & a common service account. Oracle auditing wont have any way to tell who is making the update.

Your best option here might be a pl/sql API to the table where you can pass user info from the web tier, and then have the API manage the audit entries.

Narendra said...

I think there are certain things that are application related that may or may not need to be displayed to the end-user.
But I guess you can always filter out what you don't want to "display". Generating the audit data and displaying it to user are two separate acivities. And if built-in AUDIT generates too much audit (i.e. it does not provide selective auditing capabilities), then Fine-Grained Auditing (FGA) can be another built-in option. The reason for my preference for using built-in features is, as Tom Kyte says, if business has paid for it once (while purchasing license) why should it be made to pay for developing (and maintaining) similar functionality ?
John,
Even if the application is using connection pool and common service account, it is (generally) a requirement that end-user identity be identified (and stored) in the database. Otherwise, the objective of auditing itself may be lost in applications that use connection pool.

John T said...

Keep in mind the auditing in Oracle requires named Oracle accounts. If you are using a connection pool, your web application users will have no identity in Oracle. Oracle will only see the common service account, and the operating system account of the application server. The built in Oracle auditing will be useless.