Thursday, July 30, 2009

Design: Persist or Derive? Part II

Part I is here.

I realized near the end of that post that I had completely screwed it up. I think some of the intent was conveyed, but not really what I wanted. I'm going to try it again.


ONLINE_STORES has 4 columns:
  • ONLINESTOREID - sequence generated surrogate key
  • NAME - String, can be anything really
  • START_DATE - When this store went online
  • END_DATE - When this store went offline
ONLINE_STORE_STATUS - The purpose of this table is to capture the history of a store's status, either up (available) or down (unavailable). A record is created when the store goes down, it can be either a pre-determined amount of time where END_DATE is set to a time in the future (say 30 minutes) or the END_DATE can be NULL and must be manually re-activated. It has the following columns:
  • ONLINESTORESTATUSID - sequence generated surrogate key
  • REASON - Why was the store de-activated or brought down. Typically I would supply a list of known reasons, but it's unnecessary for my purposes.
  • START_DATE - Time the online store was de-activated.
  • END_DATE - Time the online store was re-activated.
What about Persist or Derive?

Similar to the example from Part I, you could do this (find an online store's status) another way be storing the status inline in ONLINE_STORES. Add a column (Persist) STATUS with a check constraint that limits the values to either UP or DOWN, along with a NOT NULL constraint of course.

In a pure OLTP environment that is probably the most efficient solution. Reporting on down times, or better, how long has an online store been UP, is sometimes an afterthought. This can be handled by a shadow/history/audit/logging table. Those have always felt clunky to me.

Many systems are a hybrid of OLTP and reporting (DW/DSS/etc.). My approach has been to tie the two tables together using a View (Derive) to get the answer to whether an online store is UP or DOWN. There might even be a name for that in the modeling books...I should read up.

Here are the scripts necessary for my demo:
CREATE TABLE online_stores
onlinestoreid NUMBER(10)
CONSTRAINT pk_onlinestoreid PRIMARY KEY,
name VARCHAR2(100)
CONSTRAINT nn_name_onlinestores NOT NULL,
CONSTRAINT nn_startdate_onlinestores NOT NULL,
end_date DATE

CREATE TABLE online_store_status
onlinestorestatusid NUMBER(10)
CONSTRAINT pk_onlinestorestatusid PRIMARY KEY,
CONSTRAINT fk_onlinestoreid_oss REFERENCES online_stores( onlinestoreid )
CONSTRAINT nn_onlinestoreid_oss NOT NULL,
reason VARCHAR2(100)
CONSTRAINT nn_reason_oss NOT NULL,
CONSTRAINT nn_startdate_oss NOT NULL,
end_date DATE

INSERT INTO online_stores
( onlinestoreid,
name )
( 1,
'online store #1' );

INSERT INTO online_stores
( onlinestoreid,
name )
( 2,
'nerds r us' );

INSERT INTO online_stores
( onlinestoreid,
name )
( 3,
'geeks rule' );


------------- --------------- --------- ---------
1 online store #1 30-JUL-09
2 nerds r us 30-JUL-09
3 geeks rule 30-JUL-09

3 rows selected.
Now I just create a simple View on top of these 2 tables:
VIEW vw_active_stores
FROM online_stores os
FROM online_store_status
WHERE onlinestoreid = os.onlinestoreid
AND ( end_date IS NULL
OR SYSDATE BETWEEN start_date AND end_date ) );
Create a record in ONLINE_STORE_STATUS:
INSERT INTO online_store_status
( onlinestorestatusid,
reason )
( 1,
'maintenance' );
Select from the View:

------------- --------------- --------- ---------
3 geeks rule 30-JUL-09
2 nerds r us 30-JUL-09
Voila! I Derived!

As a brief sanity check, I created a record that had a pre-determined re-activation date (1 hour forward).
INSERT INTO online_store_status
( onlinestorestatusid,
end_date )
( 2,
'the geek sliced us...',
SYSDATE - ( 1 / 24 ),
SYSDATE + ( 1 / 24 ) );


------------- --------------- --------- ---------
3 geeks rule 30-JUL-09

1 row selected.
I'm really not sure which way is better/worse, as with anything I guess "It depends." The semaphore (flag) in ONLINE_STORES is a perfectly viable solution. It is the easiest solution, admittedly. Part of my thinking as well, and this relates back to the question I posed once before, UPDATEs in OLTP: A Design Flaw?. If I UPDATE the record in ONLINE_STORES, it has meaning. Typically it would either be to change the name or set the END_DATE. The UPDATE in ONLINE_STORE_STATUS means something else, it's just telling me the stop time of the DOWN time.

Or I am just overthinking this? Is this too much idealism? Is it idealism at all?I've talked about it so much lately I can't remember which way is up. What do you do?

Design: Persist or Derive?

The easy answer is, "It depends."

to derive (third-person singular simple present derives, present participle deriving, simple past and past participle derived)

1. (transitive) To obtain or receive (something) from something else.
2. (transitive, logic) To deduce (a conclusion) by reasoning.
3. (transitive, linguistics) To find the derivation of (a word or phrase).
4. (transitive, chemistry) To create (a compound) from another by means of a reaction.
5. (intransitive) To originate or stem (from).

To persist data in the database world is to store/write it to a table.

When you begin to model a database application, you have to make decisions on whether or not to persist the data (store it in a table) or derive the data on the fly (using SQL, Views, etc.)

Age is a good candidate. Do you store the person's actual age or do you calculate it on the fly? If you store it, you'll need to create a process (daily, weekly, etc.) to ensure that it is accurate right? My birthday is on November 12th. Prior to that, I will be 37. After that, I will be 38. How do you know when to update it? On top of that, do you want to know how old I am in months? Days? Minutes? Or seconds? Do you store all of those?

Assuming you've captured the my date of birth (DOB), a simple query (better, a view) will calculate all those on the fly.
WITH dob
SELECT TO_DATE( '11/12/1971', 'MM/DD/YYYY' ) m_dob
FROM dual
m_dob dob,
TRUNC( ( SYSDATE - m_dob ) / 364.25 ) age_in_years,
TRUNC( ( SYSDATE - m_dob ) / ( 364.25 / 12 ) ) age_in_months,
TRUNC( SYSDATE - m_dob ) age_in_days,
TRUNC( ( SYSDATE - m_dob ) * ( 24 ) ) age_in_hours,
TRUNC( ( SYSDATE - m_dob ) * ( 24 * 60 ) ) age_in_mins,
TRUNC( ( SYSDATE - m_dob ) * ( 24 * 60 * 60 ) ) age_in_secs
FROM dob;

DOB: 11/12/1971
AGE_IN_DAYS: 13,775
AGE_IN_HOURS: 330,617
AGE_IN_MINS: 19,837,026
AGE_IN_SECS: 1,190,221,597
So that was a pretty simplistic example.

Here's a harder one and the inspiration behind this post. Let's say you have 2 tables, PEOPLE and PEOPLE_NAMES.

A simpler design would be to store the names inline. This has a couple of benefits:
1. It's all in one table. No need to "complicate" things by joining to another table (yes, I know, that's what Oracle does)
2. Fairly straight-forward code to maintain that table. CREATE/UPDATE_PERSON would handle everything just fine.

OK, So #2 wasn't a great one. Hopefully you'll have a better explanation of benefits.

Heres' what it looks like:

So back to the first example, PEOPLE and PEOPLE_NAMES.
1. you can only have one name at a time (nick names is a different story).
2. you want to see the history and search on history (i.e. Maiden name).

Let me back up a little bit. I approach design from the reporting angle, not the functional angle. I believe the two go hand in hand though (naturally, you can't have reporting on a non-functional system). I think this comes from my background as an end-user. Without information, what good is the data?

Background over.

With Rule #1, you'll have only one record "active/current" at any given time. This will be maintained by the application layer. I've seen some interesting ways to maintain that rule with the use of triggers and/or materialized views, but I'm not going there. Of course I just realized that it's missing the start_date and end_date columns. Just assume they are there.

So how do you mimic the SELECT from the second example (i.e. names inline)? A View silly.
VIEW vw_people
people p,
people_names pn
WHERE p.peopleid = pn.peopleid
AND pn.end_date IS NULL;
In your reports (user interfaces), you use the view as opposed to SELECTing directly from the table. VW_PEOPLE now resembles PEOPLE from the second example.

What does this have to do with Persist or Derive? Well, I forgot...wait, did I just completely screw up this example? I'm not really deriving the name, I'm storing it, just in a different table.

This post will definitely have a Part II.

Apologies for wasting your time.

Part II can be found here. Hopefully it makes up for this one.

Monday, July 27, 2009

IDs (NUMBER) vs. Codes (VARCHAR2)

A slightly different way of putting that is Surrogate vs. Natural keys.

Not that I really want to get into the entire discussion...but I will...a little.

On Friday having a discussion with a developer friend after I showed him my Party model. It's all about boozing it up. OK, wishful thinking on my part, it's just a database design based on the Party (or Entity) Model. I've pondered it here and here.

I'm not planning on debating the merits of the Party Model. I do like it though, it seems to me a natural end point. It does not take into account testing of the design, nor does it take into account places where you might denormalize...and many choose to denormalize by keeping addresses in line with people or organizations.

On to my point.

To maintain data integrity without using Check constraints, I'll create a small lookup table.



Perhaps a few others. ADDRESS_CODES, as the name suggests, would use a code as the key, a VARCHAR2. HOME, WORK, OTHER, etc. Same goes for the others.

I posted my query on twitter Friday morning and received a pretty solid response from 4 people.

The response was swift:

Which was exactly what my conversation with my friend had led to.

In this case, why would Natural keys, VARCHAR2s, be bad.

1. If a user enters them in, they could misspell something, CELL would be CLEL or something. Fair enough.
2. If #1 happens, you just fix the name field instead of the key.

So I started to come around to it, until I got to STATES. States have a 2 letter abbreviation. Why not use that? It's not like they'll change (hah!) right?

Perhaps a distinction is in order then. If it's user entered content, then use a surrogate key. If not, use the supplied code.

I don't know. Is this just my personal preference? Is there a standard? What do you do?

Who's Going to OOW 09?

I am!

Who says Twitter is useless?

You can find more details about Oracle Open World here. Truthfully, I didn't get this one through Twitter, I went through the official channels and registered as a blogger.

I did annoy the heck out of @oracleopenworld.

So I owe a few people beer? It's a modest trade-off to get to attend OOW. Plus, I tend to go to the free places and buy them beer there. Makes life easier.

Now all I need is airfare and hotel accomodations in San Francisco. Anyone want to sponsor for a few months? ;)

Sunday, July 26, 2009

Logging, Debugging, Instrumentation and Profiling

It started with this tweet from Cary Millsap [company | book | blog | twitter ]

That took me to this article on his blog, Multitasking: Productivity Killer.

I then began to look through his blog history and one in particular caught my attention, On the Usefulness of Software Instrumentation, from February of this year. I left a comment:
I'm having a conceptual crisis.

Profiling. Instrumenting. Debugging. Logging.

What's the difference? What's the same?

I have one AOL friend that says adding lines to code is debug code (meaning bad), while others say it is instrumentation (meaning good). Now I've got profiling to worry about? Come on!

Can you help a poor guy out? Maybe do a post on it? If you already have, point me to it? Anything? Help? :)
To which he responded:
Hmm, good one. I don't have enough time left today to work on it before close of business, but let's chat about it via email. That ought to help you out, and then I'll be glad to keep thinking about it and write something up. Write me via —Cary
What follows is the conversation between Cary and I, reprinted with here with his permission.

Me: July 15, 2009, 10:09 PM
As directed...;)

I'm a big believer in Instrumentation, until recently, when I got all confused (shocker).

I was having a conversation with an old colleague...I asked about Instrumentation, and he indicated that Instrumentation created the ability to essentially monitor your system so that operational folks could adjust (more hardware, bandwidth, etc). My clumsy example, logins typically take 1ms, now they are taking 10ms. He agreed, sort of. The best I could figure out, from his definition, was that Instrumentation of software was like SixSigma is for process. (I only have a vague understanding of SixSigma).

Then I stumbled on your article. BTW, I think this is the Tom Kyte article you were referring to.

**Logging, I get. I think. Logging allows you to capture exceptions or other "bad" things and then, optionally, notify someone if it is severe enough.
**Debugging. It's a superset of Logging, I think. You are technically logging, but you may or may not have the notifications or ability to take immediate action.
**Instrumentation. ?
**Profiling. ?

I used to think I knew what Instrumentation was, but now I'm not so sure.

I think I'm fairly good at designing, then writing code for Oracle database applications. I consider security in my design and code stages. I consider future enhancements in design and code. I try to think of it all. I've even used debug statements liberally applied to all of my code. I want to do it the right way though. So my company brings you in to take a look, an audit, a pre-emptive strike if you will, and you say, "Man, this is some pretty awesome code here! Where's my check?."

I would love to see you write something up, 5 different articles, one on each topic then put them all together in a final post. Just kidding of course. Whatever you do write though, I'll be there asking questions!
Cary: July 15, 2009 11:36 PM
Here are some thoughts. I'd like you to press back with your questions and comments before I consider putting this up as a blogpost.

I think a good definition for debug code is, it's the code you put into your application when you'd rather have your code tell you what it's doing than use a proper debugger (with breakpoints and watches and all that). It's the printf("Here I am\n") statements that you throw into your code to help you figure out why it's not doing what you thought it was going to do. I use both debuggers and debug code. I use debug code anywhere I feel like a program I'm shipping to someone may need to report some information to me after it's deployed. Actually, I take that back...

When debug code gets fancy and becomes a permanent fixture in your code, like "if ($debug =~ /[gex]/i) { sprintf("%.6f entering %s\n", gettimeofday(), caller()); }", that's when I think it's really logging code. I think of logging as a permanent part of your code that adds some kind of operational feature to your code. It's more for the person managing how your code runs than it is for the person who runs it as a business function. Logs answer questions like, "Show me when the different phases of last night's process ran." The Oracle alert file is a log. Things that log4j prints out are logs.

Instrumentation, I'd say, is logging with a time-oriented purpose. I think people use the word pretty loosely to mean anything they want it to mean, so maybe instrumentation includes everything that's not a core function of the application. But, I like the word instrumentation. It puts a picture in my mind of a Starrett micrometer measuring the thickness of some Swiss watch part to within 0.0001 of an inch. It seems to me like the kind of a word that should mean something more careful than just "all code that's not core function code." When I say instrumentation, I mean logging with a focus on where time has gone.

Now, profile code is a different matter entirely; it is a very carefully restricted subset of instrumentation. A profile is a spanning, non-overlapping account of response time. Spanning means that the sum of the elements has to be greater than or equal to the response time the profile describes. Non-overlapping means that the sum of the elements has to be less than or equal to the response time the profile describes. In other words, a profile has to describe exactly a response time event; not less time than that, nor more. (For tasks with parallelized subtasks in them, the overall task has a profile whose interesting part requires drilling down into the individual serial [but executed in parallel] subtasks.) So, profile code is a type of instrumentation that is logging code that explains time in such a manner that you can create a profile from it.

The 10046 trace code path inside the Oracle Database barely qualifies as profile code, because a profile can be constructed from its output data, but I say "barely," because it's very, very difficult. It may be easy to get it 80% right, like examples in Jeff's and my Optimizing Oracle Performance book explain, but the last 20% is a huge chore. That last 20% is most of why the Method R Profiler (which was called the Hotsos Profiler when I wrote about it in OOP) has so many man-years of labor in it. Good profiling code is difficult to write (especially because one of your constraints needs to be to minimize the measurement intrusion effect of the profiling code itself), but it's like any other functional requirement. First, you have to know what you want, which requires trial-and-error before you get the design right. Then it becomes a software development project for a specific function, which you build just like you'd build anything else.

Me: July 16, 2009 12:11 PM
OK, Debugging and Logging are much more clear now.

You said:

It seems to me like the kind of a word that should mean something more careful than just "all code that's not core function code."

Would you define core-function code? Would you want to Instrument everything? Or is that the piece I am missing?


Instrumentation, I'd say, is logging with a time-oriented purpose.

and then

When I say instrumentation, I mean logging with a focus on where time has gone.

I'm not sure I'm following you with the time piece (pun intended). If I have Debug code which has now become Logging code, the table (or file, etc) would also capture the current date and time (or timestamp) right? You could use that to get metrics and such. Or am I missing something again?

On Profiling:

(especially because one of your constraints needs to be to minimize the measurement intrusion effect of the profiling code itself)

Kind of like introducing a TV crew or video camera? Where people act different because the camera is there? Isn't that an anthropology concept as well?

Profiling code lives outside of the code? Is that the concept? You have to be at a pretty low level to capture that stuff don't you?

I've got a ways to go.

If I didn't say it, I truly appreciate you taking the time for this.
Cary: July 16, 2009 11:11 AM
Inline below...


On Thu, Jul 16, 2009 at 11:11 AM, chet justice wrote:
I'm going to take a gander first, without help.

OK, Debugging and Logging are much more clear now.

You said:

It seems to me like the kind of a word that should mean something more careful than just "all code that's not core function code."

Would you define core-function code? Would you want to Instrument everything? Or is that the piece I am missing?

I would define "core-function code" loosely as the stuff that non-technical people talk about when they describe what your code does.

There are engineering trade-offs at every turn. Oracle example: We have 10046 tracing, which turns on instrumentation for every dbcall and oscall executed by the Oracle kernel. Some people shy away from using this instrumentation level because they're worried about the measurement intrusion effect. I believe that the intrusion is almost always well worth the cost. Oracle also provides 10200 tracing, which turns on instrumentation for every consistent read executed by the Oracle kernel. Now there is some measurement intrusion, because it makes the kernel emit a long line of trace data every time it executes the CR code path. Oracle provides both (and a couple hundred more, too), because sometimes you need both. But, thankfully, they gave us the ability to turn them on/off independently, so the run-time operational manager gets to make the trade-off decision about how much measurement intrusion he's willing to live with.

So, in response to your "instrument everything?" question, it's an artistic answer with an economic constraint: You need to instrument everything that will be in your business's best interest to have instrumented when the code goes out. The economic constraint is that you don't want to instrument something you'll never need to activate in the field: that's a waste of development time that could have been better allocated to writing some other function. I say artistic because you can't always know what parts of your code are going to need the instrumentation. You have to feel your way through that, and sometimes events over time will very much surprise you.


Instrumentation, I'd say, is logging with a time-oriented purpose.

and then

When I say instrumentation, I mean logging with a focus on where time has gone.

I'm not sure I'm following you with the time piece (pun intended). If I have Debug code which has now become Logging code, the table (or file, etc) would also capture the current date and time (or timestamp) right? You could use that to get metrics and such. Or am I missing something again?

No, I think you're fine there. What I really need to say, I guess, is that the word instrumentation might have so many different loose definitions that it might be a good word to stay away from, except as a general category marker.

On Profiling:

(especially because one of your constraints needs to be to minimize the measurement intrusion effect of the profiling code itself)

Kind of like introducing a TV crew or video camera? Where people act different because the camera is there? Isn't that an anthropology concept as well?

Yes. Except with machines, the intrusion effect is much easier to measure and predict. There's the occasional race condition problem that the presence of instrumentation can make more difficult to reproduce, so it's still an interesting, potentially nonlinear effect. (I'm talking about the times when running without tracing takes 10 seconds. Then you turn trace on, and your thing never takes more than 2 seconds. That happens sometimes.)

Profiling code lives outside of the code? Is that the concept? You have to be at a pretty low level to capture that stuff don't you?

There are two parts to profiling: (1) the emission of detailed event timings, and (2) the aggregation of those timings into something usable for an analyst. Part #1 is what happens when you run a.out after doing a "cc -pg hello.c". The execution writes out detailed timing data to a file called gmon.out. Then part #2 happens when you run gprof upon gmon.out to produce a readable, analyzable report. With Oracle, part #1 is turning on trace; part #2 is running something like our Profiler. Part #1 happens inside your code. Part #2 happens outside your code.

I've got a ways to go.

If I didn't say it, I truly appreciate you taking the time for this.

You're welcome. I learn by explaining; I'm glad you consider me a good person to learn with this week.
Me: July 16, 2009 2:30 PM
Ah ha! I think I have it now.

Profiling is using Logging (possibly Debugging) information that has been turned on (possible performance impact, which is at least measurable), reading that Data and then turning it into Information (report)?

When you say artistic, do you mean something like processing transactions (not database transactions but financial)? That would be a good candidate wouldn't it?

Cary: July 16, 2009 3:10 PM
What I mean by artistic is that there aren't any hard-and-fast rules that a programmer can use to determine what code paths to instrument, and which not to. And there are no hard-and-fast rules describing how to segment the instrumentation; e.g., 10046 and 10200 should be separately controlled? or should both instrumentation types be combined into one switch? I use the "art" word to denote tasks for which there's no deterministic list of steps that an inexperienced, untalented person can follow to do just as good a job as an experienced, talented one.

You might like to read the paper that I've attached.
I'm still looking for the public link to this paper. I will update this post when I find it.

Me: July 21, 2009 12:47 AM
Interesting read. I haven't quite wrapped my head around, but I've only read it once.

So to sum it up (sort of), I, as a PL/SQL Developer can't really do that much in regards to profiling? When you talk about profiling, you're talking at the kernel level right? If not...then I'm lost! :)

Cary: July 21, 2009 9:03 AM
Well, as a PL/SQL programmer, you have 10046 data, which can be assembled into a profile. And you have DBMS_PROFILER, which gives you a response time profile partitioned by PL/SQL line number. As Jeff and I suggested in our book, the analysis process there is:

1. Create a response time profile with 10046 data.
2. If the majority of your time is spent in an EXEC call upon a PL/SQL block (as opposed to a PARSE, EXEC, or FETCH of a SQL statement within the block), then use DBMS_PROFILER to figure out which line(s) of PL/SQL are causing your performance problem.
Me: July 21, 2009 11:36 AM
duh...dbms_profiler. I am a bit slow on occasion.

Very cool stuff. Part of my lack of understanding is that I've never really used a debugger or profiler. I think my first exposure to it (looking back now) was just a couple of months ago and JRockit, the ability to "look" and see what was running in the JVM. I've done Java programming before, but I never advanced past the barely intermediate stage. PL/SQL doesn't (or hasn't to my knowledge until now) made things easy...from a command line anyway.

So I profiled a simple session calling DBMS_LOCK.SLEEP. Pretty cool stuff.
2 dbms_profiler.start_profiler
3 ( run_comment => 'JUST A TEST' );
5 dbms_lock.sleep(60);
7 dbms_profiler.stop_profiler;
8 END;
9 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SYS@TESTING>SELECT * FROM plsql_profiler_runs;

---------- ----------- ------------ --------- ------------

1 row selected.

SYS@TESTING>SELECT * FROM plsql_profiler_units;

---------- ----------- ----------------- ------------- -----------

4 rows selected.

SYS@TESTING>SELECT * FROM plsql_profiler_data;

---------- ----------- ---------- ----------- ---------- ---------- ---------- ---------- ----------
1 1 1 0 0 0 0
1 1 8 0 0 0 0
1 1 16 0 1069 1069 1069
1 1 32 0 654 654 654
1 1 41 0 0 0 0
1 1 49 0 0 0 0
1 1 57 0 0 0 0
You can find more information on DBMS_PROFILER here.

I have not yet completely wrapped my mind around this yet. I think it might be in part due to the fact that I have never really programmed in another language like C or Java (well, a little) where you run the debugger and profiler against your code.

I don't know if this conversation makes me look like a complete dolt either...I don't really care if it does. Cary graciously made himself available to answer my questions and tolerated my...inexperience in this area. It's no wonder he is held in such high regard in the Oracle community. Not only does he know his stuff, he's more than willing to share it.

Hopefully some of you will get something out of this conversation. I'm still hoping that Cary will do his own posts in the future, but I am grateful he let me share this with you.

Friday, July 24, 2009

(Old) Social Media and Travis Page

Driving home on Wednesday listing to the local AM station, 970 WFLA, there was an announcement for the radio resumes and the most recent winner, Travis Page.

Travis then came on air, told us a little about himself and what he was looking for. I couldn't believe it when I heard Database Technologies.

Really? On the radio? I couldn't believe it.

When I got home I had to check this guy out. First thing I wanted to know, was he an Oracle guy or a Microsoft guy. Reading through his resume I found that he did have a little bit of Oracle, but he was obviously focused on SQL Server. Boo.

I had to give the guy credit though. I look(ed) for jobs through twitter, dice, monster, etc., pretty much all the places where you didn't have to talk. This guy used the radio. The radio isn't new, or shiny. Perhaps it can be shiny...but I digress.

So I dropped Travis a line telling him about tonight's SOUG meeting and encouraged him to attend. Sure enough he did.

I love that kind of dedication. I've tried to help people in the past only to see them ignore my advice.

So, if you're in the Tampa Bay area and looking for a junior developer, either Oracle or SQL Server (ugh), call Travis immediately for an interview.

We talked for a bit after the meeting and seems like a very genuine guy (plus, he showed up!). Besides his little SQL Server problem, I think a company could find good use out of a guy like him. Resourceful and dedicated are two very good traits to have in an employee.

If you missed it up above, you can find Travis' resume here.

Thursday, July 23, 2009

SQL Objects, JDeveloper, JPublisher and Java

This is one for my application developer friends.

Say your application uses stored procedures. Not just any stored procedures, but ones that accept arrays or SQL Object types (User Defined Types - UDTs).

One thing you have to do is map a java object to that SQL Object before you can pass it back to the stored procedure. I can't remember the specifics, but I can show you how to "automatically" create the Java code that maps to the SQL Object.

First, create type UDT.

If will be a table of a it called a multi-dimensional array? Whatever.
id NUMBER(10),
col1 VARCHAR2(30),
col2 VARCHAR2(30),
col3 VARCHAR2(30)

I'll create a procedure that accepts T_TABLE as an input parameter. It won't do anything, but just to give you an idea.
PROCEDURE test( p_t_table IN T_TABLE )
END test;
Now go into JDeveloper and through the Database Navigator, select the schema you placed these objects.

Right click on the object and select Generate Java (as shown above).

You'll then be presented with an options menu:

It's all foreign to me so I accept the defaults. I do notice however that it's pretty customizable, for instance, you can select how you want to Number Types to be generated, either objectjdbc, oracle, jdbc or bigdecimal. You'll know better about what this means than I will.

From that, you get 3 files, which maps to the TABLE OF R_RECORD, which maps to the SQL Type R_RECORD, and finally I don't really know what it does. I'm sure you will though.

Here's the code generated, in order.
import java.sql.SQLException;
import java.sql.Connection;
import oracle.jdbc.OracleTypes;
import oracle.sql.ORAData;
import oracle.sql.ORADataFactory;
import oracle.sql.Datum;
import oracle.sql.ARRAY;
import oracle.sql.ArrayDescriptor;
import oracle.jpub.runtime.MutableArray;

public class TTable implements ORAData, ORADataFactory
public static final String _SQL_NAME = "CJUSTICE.T_TABLE";
public static final int _SQL_TYPECODE = OracleTypes.ARRAY;

MutableArray _array;

private static final TTable _TTableFactory = new TTable();

public static ORADataFactory getORADataFactory()
{ return _TTableFactory; }
/* constructors */
public TTable()

public TTable(RRecord[] a)
_array = new MutableArray(2002, a, RRecord.getORADataFactory());

/* ORAData interface */
public Datum toDatum(Connection c) throws SQLException
return _array.toDatum(c, _SQL_NAME);

/* ORADataFactory interface */
public ORAData create(Datum d, int sqlType) throws SQLException
if (d == null) return null;
TTable a = new TTable();
a._array = new MutableArray(2002, (ARRAY) d, RRecord.getORADataFactory());
return a;

public int length() throws SQLException
return _array.length();

public int getBaseType() throws SQLException
return _array.getBaseType();

public String getBaseTypeName() throws SQLException
return _array.getBaseTypeName();

public ArrayDescriptor getDescriptor() throws SQLException
return _array.getDescriptor();

/* array accessor methods */
public RRecord[] getArray() throws SQLException
return (RRecord[]) _array.getObjectArray(
new RRecord[_array.length()]);

public RRecord[] getArray(long index, int count) throws SQLException
return (RRecord[]) _array.getObjectArray(index,
new RRecord[_array.sliceLength(index, count)]);

public void setArray(RRecord[] a) throws SQLException

public void setArray(RRecord[] a, long index) throws SQLException
_array.setObjectArray(a, index);

public RRecord getElement(long index) throws SQLException
return (RRecord) _array.getObjectElement(index);

public void setElement(RRecord a, long index) throws SQLException
_array.setObjectElement(a, index);
import java.sql.SQLException;
import java.sql.Connection;
import oracle.jdbc.OracleTypes;
import oracle.sql.ORAData;
import oracle.sql.ORADataFactory;
import oracle.sql.Datum;
import oracle.sql.STRUCT;
import oracle.jpub.runtime.MutableStruct;

public class RRecord implements ORAData, ORADataFactory
public static final String _SQL_NAME = "CJUSTICE.R_RECORD";
public static final int _SQL_TYPECODE = OracleTypes.STRUCT;

protected MutableStruct _struct;

protected static int[] _sqlType = { 2,12,12,12 };
protected static ORADataFactory[] _factory = new ORADataFactory[4];
protected static final RRecord _RRecordFactory = new RRecord();

public static ORADataFactory getORADataFactory()
{ return _RRecordFactory; }
/* constructors */
protected void _init_struct(boolean init)
{ if (init) _struct = new MutableStruct(new Object[4], _sqlType, _factory); }
public RRecord()
{ _init_struct(true); }
public RRecord(java.math.BigDecimal id, String col1, String col2, String col3) throws SQLException
{ _init_struct(true);

/* ORAData interface */
public Datum toDatum(Connection c) throws SQLException
return _struct.toDatum(c, _SQL_NAME);

/* ORADataFactory interface */
public ORAData create(Datum d, int sqlType) throws SQLException
{ return create(null, d, sqlType); }
protected ORAData create(RRecord o, Datum d, int sqlType) throws SQLException
if (d == null) return null;
if (o == null) o = new RRecord();
o._struct = new MutableStruct((STRUCT) d, _sqlType, _factory);
return o;
/* accessor methods */
public java.math.BigDecimal getId() throws SQLException
{ return (java.math.BigDecimal) _struct.getAttribute(0); }

public void setId(java.math.BigDecimal id) throws SQLException
{ _struct.setAttribute(0, id); }

public String getCol1() throws SQLException
{ return (String) _struct.getAttribute(1); }

public void setCol1(String col1) throws SQLException
{ _struct.setAttribute(1, col1); }

public String getCol2() throws SQLException
{ return (String) _struct.getAttribute(2); }

public void setCol2(String col2) throws SQLException
{ _struct.setAttribute(2, col2); }

public String getCol3() throws SQLException
{ return (String) _struct.getAttribute(3); }

public void setCol3(String col3) throws SQLException
{ _struct.setAttribute(3, col3); }
import java.sql.SQLException;
import java.sql.Connection;
import oracle.jdbc.OracleTypes;
import oracle.sql.ORAData;
import oracle.sql.ORADataFactory;
import oracle.sql.Datum;
import oracle.sql.REF;
import oracle.sql.STRUCT;

public class RRecordRef implements ORAData, ORADataFactory
public static final String _SQL_BASETYPE = "CJUSTICE.R_RECORD";
public static final int _SQL_TYPECODE = OracleTypes.REF;

REF _ref;

private static final RRecordRef _RRecordRefFactory = new RRecordRef();

public static ORADataFactory getORADataFactory()
{ return _RRecordRefFactory; }
/* constructor */
public RRecordRef()

/* ORAData interface */
public Datum toDatum(Connection c) throws SQLException
return _ref;

/* ORADataFactory interface */
public ORAData create(Datum d, int sqlType) throws SQLException
if (d == null) return null;
RRecordRef r = new RRecordRef();
r._ref = (REF) d;
return r;

public static RRecordRef cast(ORAData o) throws SQLException
if (o == null) return null;
try { return (RRecordRef) getORADataFactory().create(o.toDatum(null), OracleTypes.REF); }
catch (Exception exn)
{ throw new SQLException("Unable to convert "+o.getClass().getName()+" to RRecordRef: "+exn.toString()); }

public RRecord getValue() throws SQLException
return (RRecord) RRecord.getORADataFactory().create(
_ref.getSTRUCT(), OracleTypes.REF);

public void setValue(RRecord c) throws SQLException
_ref.setValue((STRUCT) c.toDatum(_ref.getJavaSqlConnection()));
How do you integrate all that? I'm not really sure, but Marc recently wrote up a piece on using this method to tie into SQLUnit, pretty nice one too.

Wednesday, July 22, 2009

SOUG: Attack Your Database Before Others Do

SOUG, also know as the Suncoast Oracle Users Group, is holding it's montly meeting tomorrow night in Tampa. The meeting is held at the PricewaterhouseCoopers facilities located across the street from Raymond James Stadium, home of the Tampa Bay Buccaneers.

Todd DeSantis, of Sentrigo will be presenting. I will not try and recreate the presentation description, I'll just practice the fine art of cut and paste.
In this presentation, we will show typical security flaws found in PL/SQL and Java code due to programmer mistakes. We will demonstrate how to use existing open-source scanning and fuzzing tools to automatically find and flag such flaws, and also demonstrate how creating your own tools in PL/SQL can help you keep your code secure.

In this presentation, you will learn:
1. Common security mistakes developers make
2. How to use open source tools to find those mistakes
3. How to roll out your own PL/SQL fuzzer

Todd DeSantis brings a wealth of technical knowledge and a passion for using technology to better society to his position as lead North American Sales Engineer at Sentrigo. With a background in computer science from Worcester Polytechnic Institute, Todd has been using his understanding of computer programming and database systems throughout his career. At Sentrigo Todd is striving to bring a higher level of database security and safety to the enterprise. Prior to Sentrigo Todd successfully helped Fortune 50 companies rethink data access paradigms with Endeca Technologies. Todd started his career at Enerjy Technologies where he helped organizations improve overall levels of Java code quality and visibility. In his spare time Todd, an avid audiophile, enjoys working toward creating the 'absolute sound' with hi-fi audio systems, and enjoys many different genres of music.
There are possible opportunities for (beer) networking afterwards.

Tuesday, July 21, 2009

Join the ORACLENERD Family - II

Part I can be found here courtesy of Jake [ blog | twitter ].

I had been thinking about this for a few months now, but I can't seem to recall the initial inspiration. In May, I asked Tiffany Morgan to do a guest post here and she came up with the ever popular, Top Ten Grammatical Errors That Make People Look Stup—Silly.....

I asked a former colleague a couple of months ago to write an article based on a short presentation he gave at work on XML Schemas (with constraints and everything!).

Now I'm asking you.

Good question. In my correspondence with Jake he asked me what my goals were:
What's your goal? Drive traffic? Increase content? Broaden what's covered? Post less? World domination?
Definitely the last one, World Domination.

Honestly though, I hadn't thought about it.

In talking with a couple of people tonight, I think I have a better idea now.
  1. I like to share.
  2. I want others to share.
OK, maybe it wasn't as thought out as I thought.

To Jake's questions
Drive Traffic? - Not so much. Admittedly, I check Google Analytics more than once a day. I'm fascinated by the traffic. Especially cool is the city overlay map, where I can see people from all over world have visited this site.

Money (my question)I've made $73.27. August will be 2 years. I have 294 posts.
  1* SELECT ROUND( ( 73.27 / 294 ), 3 ) money FROM dual
2 /

It's not about money. I still just want that $100 check from Google, then I'll decide about the ads. Sure, traffic may go up with guest posts, but I don't think it's reasonable to expect to make money at this.

Increase content?
Yes! That would be awesome. That can't be a bad thing.

Broaden what's covered?
Absolutely! My world lies in the database. Oracle owns about 16 million products now. I'd love to have more exposure, however small, to those technologies.

Post less?
No. I like writing. It's cathartic to me. Helps me communicate my ideas better when I have to spell them no, I don't plan on going anywhere.

World domination?
Who wouldn't want this?

Why? Part II
In the 4 or so years I've been reading blogs, many have come and gone. The esteemed Mr. Kyte has posted 13 times this year. 13. Not that I blame him or anything, I never could understand how he could answer all those questions (repeatedly) and still have time to blog. I'm not calling him out...I get "life."

In that regard, maybe it would be easier for some to just contribute here? First, it's not OTN so you don't have to come up the most amazing idea ever. B, it's casual. Third, maybe it will inspire you to start your own blog (on Oracle). I would love that. I would love to be a part of that.

In conclusion (I always hated having to say that), think about it. If you are in the least bit interested, drop me a line chet at oraclenerd, or tweet me @oraclenerd, or whatever. Even if I can help you get your own started, I'd be happy to do that. Perhaps you just started and you want (slightly more) exposure? Write it up here and we'll link back to you.

Lastly, I don't have defined rules. I don't really want them either. I've suggested the WTFPL license, which is should be just fine.

Problems, issues, flames, thoughts? Bring 'em on!

Constraints to the Max!

I ran across this question today on the Oracle-l list:
Hi List,

I have read the following but I am looking for a way to create a
case-insensitive database. Is it even possible?


Roger Xu
Later in the thread, Niall Litchfield, replied with the following:
Assuming that you mean you want to make all string data case insensitive and that the requirement has come from developers who don't want to check their inputs for case errors (though I bet they want to say business logic is an application function) then a check constraint on each column that the inserted/updated value is equal to its uppercase representation is a start. Then they'll have to either check for the constraint failing or start discussions with you about database input validation aka constraints.
That gave me an idea.
x VARCHAR2(30)
CONSTRAINT ck_upper_x_t CHECK ( x = UPPER( x ) )
What does that do? It insures that the value put in X is always uppercase.
INSERT INTO t ( x ) VALUES ( 'home' )
ERROR at line 1:
ORA-02290: check constraint (PARTY.CK_UPPER_X_T) violated


1 row created.
Cool. Now the Application Developers have to deal with it. No wiggle room there.

How far can you go?
Let's see.

x VARCHAR2(30)
CONSTRAINT ck_upper_x_t CHECK ( x = UPPER( x ) )
CONSTRAINT ck_nospaces_x_t CHECK ( INSTR( x, ' ' ) = 0 )
CONSTRAINT ck_charonly_x_t CHECK ( REGEXP_INSTR( x, '[[:digit:]]' ) = 0 )
I'm sure I could go on and on...but it's kind of fun.
ERROR at line 1:
ORA-02290: check constraint (PARTY.CK_NOSPACES_X_T) violated
No spaces allowed.
INSERT INTO t ( x ) VALUES ( '12345' )
ERROR at line 1:
ORA-02290: check constraint (PARTY.CK_CHARONLY_X_T) violated
No digits allowed!

Constraints are awesome. The very definition of constraints is awesome. Some of the best ideas come to you when you are constrained by something, usually time. OK, maybe not time in software development. Time in writing I've heard is pretty cool. Limit the colors an artist has available and see what they come up with. Less choices sometimes means better.

Typically I'll constrain the crap out of a data model. With development and testing, some of those will be relaxed. Some will be added.

The point is, don't be afraid of them. Constraints are a very good thing indeed.

Monday, July 20, 2009

How To: Users and Roles (Simple)

I'm using Apex again, which is nice. I love the database work; cleaning, refactoring, etc. but there's just something very cool about being able to put a "face" on that database stuff.

I'm mostly done with the application and I'm going through adding in security. It's not an after thought, I'm just not going to add it first as it will make my life very difficult.

I would love to use Database Authentication, but it only checks to see if you have an account. Your database roles do not carry over into Apex (yet). I don't want to use Apex Authentication mostly because I've never used it.

I could create my own table based security, but not yet. I'm trying to keep it as simple as possible for now.

I decided on a mix of Database Authentication and table based users and roles. I will not be storing passwords nor writing any custom authentication. I'll just rely on Oracle to do that. I will however capture the username and put it in an Application Item. In the Administrator's interface, the users will be able to create users...but only if they already exist in the database. I will be creating the following roles:

--ADMIN - this role can do anything. full access to the application.
--SECTION_DEVELOPER - as it stands, we can not completely hand this off to the business. There will be some intervention required by IT. This role should be able to do all that the business person can do and a couple of extra actions.
--SECTION_ADMIN - this will be the role assigned to the business person.
--TAB_DEVELOPER - similar to SECTION_DEVELOPER, just another section of the application
--TAB_ADMIN - same as SECTION_ADMIN, assigned to business user

Perhaps an easier way to visualize it:

Here's the table to hold the roles:
role_name VARCHAR2(30)
CONSTRAINT fk_rolename_troles REFERENCES t_roles( role_name )

INSERT INTO t_roles( role_name )
INSERT INTO t_roles( role_name, parent_role_name )
INSERT INTO t_roles( role_name, parent_role_name )
INSERT INTO t_roles( role_name, parent_role_name )
INSERT INTO t_roles( role_name, parent_role_name )
I'm creating a view on top of this table because I want to use some of the hierarchical capabilities.
VIEW v_roles
SYS_CONNECT_BY_PATH( role_name, '/' ) role_path,
level role_level,
RPAD( '-', level * 2, '-' ) || role_name display_role_name
FROM t_roles
START WITH parent_role_name IS NULL
CONNECT BY PRIOR role_name = parent_role_name;
I created a procedure, is_authorized which took in the username (Application Item) and a literal; ADMIN, SECTION_DEVELOPER, SECTION_ADMIN, TAB_DEVELOPER, and TAB_ADMIN. It returned either TRUE or FALSE (boolean).

Here's the table definitions for the user and user/role intersection tables:
username VARCHAR2(30)

INSERT INTO t_users( username ) VALUES ( 'CJUSTICE' );

CREATE TABLE t_user_roles
CONSTRAINT fk_username_userroles REFERENCES t_users( username )
CONSTRAINT nn_username_userroles NOT NULL,
CONSTRAINT fk_rolename_userroles REFERENCES t_roles( role_name )
CONSTRAINT nn_rolename_userroles NOT NULL

INSERT INTO t_user_roles( username, role_name )

VIEW v_active_user_roles
v_roles r,
t_user_roles ur
WHERE r.role_name = ur.role_name;
Now I need a FUNCTION that will tell me whether this person is authorized or not.

I struggled with this a little bit (hence the post).
l_count INTEGER;
INTO l_count
FROM v_active_user_roles
WHERE username = :p_username
AND role_name = :p_role_name;

IF l_count = 1 THEN
OK, so if the code from Apex is passing in CJUSTICE for the username and ADMIN for the role, it works. But ADMIN is the all powerful user. So I need another check in there:
l_count INTEGER;
INTO l_count
FROM v_active_user_roles
WHERE username = :p_username
AND role_name = :p_role_name;

IF l_count = 1 THEN
INTO l_count
FROM v_active_user_roles
That's where I kind of got stuck. The extra SELECT COUNT(*) would have needed at least one more additional IF THEN control statement. Time to step away.

Thinking about it, it might be easier if ADMIN were the bottom-most...leaf? In other words:

Using SYS_CONNECT_BY_PATH it would have been easy to check. I think.

If ADMIN is at the bottom though, you'll have to jump through all kinds of code hoops to keep it at the bottom. That's the indicator to me that I'm overthinking it.

What about a loop, using ROLE_PATH (SYS_CONNECT_BY_PATH)? Something like this:
    FOR i IN ( SELECT role_path
FROM v_roles
WHERE INSTR( role_path, l_rolename ) > 0 )
IF INSTR( i.role_path, l_rolename ) < INSTR( i.role_path, p_rolename ) THEN
If the user's role (ADMIN) is at a place closer to the front of the string and the passed in role (TAB_ADMIN) is further back, the user will be authorized to see/view/execute whatever the authorization is applied to. Here's the code all put together:
FUNCTION is_authorized
( p_username IN VARCHAR2,
l_count INTEGER;
l_rolename VARCHAR2(30);
--1. see if the user/role exists
--1a. if true, return true
--1b. if false
--1b1. find the role the user is assigned to
--1b2. see what roles inherit from the user's role
INTO l_count
FROM v_active_user_roles
WHERE username = p_username
AND role_name = p_rolename;

IF l_count = 1 THEN
SELECT role_name
INTO l_rolename
FROM v_active_user_roles
WHERE username = p_username;
WHEN no_data_found THEN

FOR i IN ( SELECT role_path
FROM v_roles
WHERE INSTR( role_path, l_rolename ) > 0 )
IF INSTR( i.role_path, l_rolename ) < INSTR( i.role_path, p_rolename ) THEN

END is_authorized;
show errors
It's certainly not the best solution in the world. When I have a bit more time, I'll certainly revisit it.

Integration with Apex
Since I haven't worked with Apex regularly since 2.2, I started to use the Conditions section to store the code. What that would mean is possibly storing
RETURN is_authorized( :F_USERNAME, 'ADMIN' /*(or others)*/ );
in about 60 locations throughout the application. Then I saw this:

I had completely forgotten about that! Authorization Schemes allow you to define as many...schemes as you want. I created 5, one for each role.

Added bonus, when I need to change these, I only have to do so in one place.

You can add Authorization Schemes to almost everything. Pages, HTML Regions, Buttons, Items, Columns in reports, etc. As fine a granularity as you would ever need.

Update: 08/07/2009
This function does not work! Sue me. There is a later post addressing the issues I found, Users and Roles - Revisited.

Wednesday, July 15, 2009

PMDV: Lessons Learned

On Monday I "released" Version 0.1. Not only did I blog about it, I sent private emails to people I know asking them to take a look. I'm going to summarize the good and the bad here.

I had planned all along to put the objects in the SYS schema as Oracle treats it differently. By differently I mean that it's more secure than any other user/schema. You cannot create objects in the schema as another user, even with DBA privileges. Same goes for executing code, you need an explicit GRANT from SYS in order to run a package. I can't think of one off the top of my head, but around on or after 9i, DBMS_SESSION needed to be granted explicitly to users.

What's the lesson learned? Well, if you create objects in the SYS schema you now have an unsupported database. Yikes. I don't want that to happen and I will be updating the instructions tonight to reflect that (not that anyone has gone and thrown it in production yet).

What's in a Name?
Poor Man's Data Vault was not the best choice of names. I knew that all along really, but stuck with it just because. Someone suggested that the general concept might be related a bit more to Audit Vault; my only exposure to it was at COLLABORATE.

If there are any good ideas out there, let them fly. The only constraint? You have to use the letters P, M, D and V. Why? Because I've prefixed everything with that. :)

Documentation is hard, but I tried to get down as much as possible on the wiki. A friend and former colleague (twice), told me it sucked. It wasn't very descriptive or clear as to the purpose. I don't think it sucked, but it wasn't good or even great. Mediocre maybe?

Reason for Being
Maybe writing software is hard? Not the coding part necessarily, all the "extraneous" stuff. Documentation. User Expectations. Documentation. Testing (what?). Documentation. Critiques.

Despite the "smallness" of the project, I have a better appreciation for what Jake talks about over at the AppsLab. I can only imagine what it's like doing it at that scale.

Final Thoughts
I think I will continue on with the project.

I believe the goals have changed a bit so I'll need to adjust. Actually, I'll need to refactor my goals.

I will try to use the criticism wisely and allow that the help shape the new goals.

I'll have to change the target audience...well, define one. I had bigger shops (multiple DBAs) in mind initially or a place that needed to be SOX compliant. Maybe it should be small or medium size shops? I don't know.

Don't be afraid to leave comments, good and bad.

Tuesday, July 14, 2009

DBA: Poor Man's Data Vault

Nothing against women, so don't start with me. By "Man" I mean (hu)Man. Though there are plenty out there that would argue against any DBA, in any way, resembling a human.

Back in March I started this project on Google Code. Here I am 4 months later and I've finally gotten around to it.

What is it?
Version 1.0, or better, Version 0.1, will simply require you to set a context prior to deployment. All that means is this:
( p_name => 'TICKET #44',
p_description => 'CLEANING UP CJUSTICE SCHEMA' );
What's the big deal with that?
Well, it depends. The idea was originally sparked by watching my DBAs deploy my projects at a previous job. After deployment, they would run this script, enter the ticket number, start and stop time and it would spool off this report. I never looked at the report, but guessed that it had something to do with Security or Auditing. So I made the grand leap to assume it was all the objects that were affected by the deployment. It was all an assumption though as I was, on occasion, asked to provide details on objects that I deployed by the Security team.

What does it do?
It's pretty simple actually. One trigger, ON DATABASE BEFORE DDL is set up. That calls a package with the relevant exceptions, and determines whether the context has been set (if the other exceptions are not met). If all the exceptions are met and the context is set, you can deploy your changes (DDL). If not, you get a pretty error message like this:
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-06512: at line 14
When you first create your work (order), the start date and end date are set (end_date defaults to 30 minutes, but time can be added if necessary). Also, all INVALID objects are captured prior to deployment. Once you are finished, you issue the following commands:
CJUSTICE@TESTING>EXEC pmdv_work_pkg.end_work;

When you issue the END_WORK command, the INVALID objects are captured again (DURING), the work record updated, then any changed objects during the deployment window are captured (from DBA_ALL_OBJECTS.LAST_DDL_TIME).

For those of you worried about putting such a thing in production, I am also providing a trigger (AFTER DDL) and a table to capture the DDL in your environment. That way you can note the exceptions and add them to the code before ever throwing such a thing in production. Not that I think many of you would do that...especially without more rigorous testing at the very least.

Testing is my next step. I know, it's backwards. I just want this out the door. I want someone to look at it and say, "Hey Chet, that's a piece of crap." or "You're nuts if you think I would ever put this on a production system!"

I hear you. Testing will be done. Test cases will also be provided so that you can verify the results as well. Hopefully, if enough of you nit-picking, anal-retentive DBAs take a look at it and critique it (good and bad), I can make something of it. Something that would be useful to everyone. So go crazy on it please.

Here's the package spec for PMDV_WORK_PKG:
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
CAN_DEPLOY_DDL is used by the trigger, PMDV_BEFORE_DDL to determine whether or not the DDL is allowed. Currently, I have 3 exceptions:
2.  If the ORA_LOGIN_USER IS NULL. I noticed this behavior on a 10g instance. DDL OK.
3.  By default Oracle does not allow objects to be created in the SYS schema. Unless your system is compromised in some way, it is reasonably safe to assume that if the ORA_DICT_OBJ_OWNER = SYS, DDL is OK.
4.  PMDV_CONTEXT is set. DDL is OK.

If you see anything in that list that jumps out at you, please let me know. I absolutely do not want to introduce something into the wild that could be harmful.

Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
This gets it all started. It accepts 2 strings, they can be anything, there is no validation check on them. P_ESTIMATE_TIME defaults to 30 minutes. You can either create the work (order) with the default, it give yourself an hour, or however much time you need really.
This ends your session, captures INVALID and CHANGED objects and clears the context.

Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
Kind of self explanatory. Run this in the same (SQL*PLus) session to extend your window.

When you issue the CREATE_WORK statement, the PMDV_LISTENER job is enabled and runs once a minute to check if your session has expired. If you finish in 5 minutes, you don't necessarily have to issue the END_WORK call. The job will expire the work (order) after the allotted time.
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
RECONNECT allows you to open up another session and re-enable the context (i.e. allow you to deploy). You can find this information in PMDV_WORK.CLIENT_IDENTIFIER column.

Since I'm using DBMS_SESSION.UNIQUE_SESSION_ID, you can only have 1 deployment per SQL*Plus session. All you need to do to get around that is DISCONNECT and login again.

For the love of Pete, give me some feedback. Is this something you could ever use? Am I silly (well, you don't have to answer that specifically) for thinking this up? Any and all comments will be most appreciated. I've spent a significant amount of time over the last couple of weeks getting this together...hopefully to some end.

Where do I find this?
The code can be found here. I'll be wrapping it up all nice and pretty over the next couple of days. I'll create a download package as well. The wiki page can be found here.

Oh yeah, did I mention, I built an Apex front end for it?

Here's what that looks like.

The Work (Order) Screen:

And the Work (Order) Details Screen:

Thursday, July 9, 2009

OEL: VirtualBox Guest Additions

Thankfully, I managed to get Oracle (the Database) installed on Oracle (the Enterprise Linux). No easy task for a rookie.

Like most people, I don't like the 1024x768 default resolution, so I tried to install the VirtualBox Guest Additions. I've tried like 22 times now. Here are the 2 different errors I would find:
[root@cent-1 VBOXADDITIONS_2.2.4_47978]# /bin/bash ./
Verifying archive integrity... All good.
Uncompressing VirtualBox 2.2.4 Guest Additions for Linux installation
VirtualBox 2.2.4 Guest Additions installation
This system does not seem to have support for OpenGL direct rendering.
VirtualBox requires Linux 2.6.27 or later for this. Please see the log
file /var/log/vboxadd-install.log
if your guest uses Linux 2.6.27 and you still see this message.
/var/log/vboxadd-install.log looks like this:
Installing VirtualBox 2.2.4 Guest Additions, built Fri May 29 19:13:34 CEST 2009

Testing the setup of the guest system

Could not find the Linux kernel header files -
the directories /lib/modules/2.6.18-128.e15/build/include
and /usr/src/linux/include do not exist.
Giving up due to the problems mentioned above.
I googled "OpenGl direct rendering" and found this entry. The suggestion was to add "x11" to the end like this:
/bash/bin ./ x11
Tried that out and I got a different error, but one that matched what the log says, something like "install kernel headers" blah blah blah. Greek.

I tried using apprently doesn't exist on OEL (or Red Hat derivatives). Ubuntu has dpkg...which doesn't work in OEL. yum! Let's try yum. Where do I find the kernel headers?



I wonder if they exist on the iso images?

So I load up Enterprise-R5-U3-Server-i386-disc1.iso, navigate to the Server directory:
(Yes, it's a picture. I can't copy out of the guest yet...shut up).

Let's try kernel-headers (couldn't be that obvious could it?)
[root@oracledb Server]# rpm -ivh kernel-headers-2.6.18-128.e15.i386.rpm
warning: kernel-headers-2.6.18-128.e15.i386.rpm: Header V3 DSA signature:
NOKEY, key ID 1e5e0159
Preparing... ################################### [100%]
package kernel-headers-2.6.18-128.e15.i386.rpm is already installed
[root@oracledb Server]#
With a tried and true method, I select the next one, kernel-devel. I'm not typing this one up. I guess I should put the name here, in case someone is googling for it: kernel-devel-2.6.18-128.e15.i686.rpm

That one installed. Unmount the OEL disc, remount the VBOX Guest Additions, /bin(g)/bash(bang) ./VBOXADDITIONS_2.2.4_47978(boom) and I'm done. Now I just need to reboot to see if it really works.

Why can't it just work?

Wednesday, July 8, 2009

Oracle Concepts: The Data Dictionary

I have come to rely pretty heavily on the Data Dictionary as I don't typically use a GUI. This stems from the fact that in my first professional IT job (reporting off of Oracle), I was given a tnsnames.ora file and something called SQL*Plus. I came from Microsoft Access...I wanted pretty pictures of my database objects. My first year was spent asking how to find the tables in a schema (schema? what's that?), how to view the SQL that made up a view, and how to view the source code.

Find Tables
SELECT * FROM user_tables;
View Text
SET LONG 100000
SELECT text FROM user_views WHERE view_name = :your_view;
View Source
SELECT line, text 
FROM user_source
WHERE name = :package_name
AND type = :package_spec_or_package_body;
Needless to say it wasn't fun.
Without pain there is no progress.
What is the Data Dictionary? dictionary, which is a read-only set of tables that provides information about the database. A data dictionary contains:

* The definitions of all schema objects in the database (tables, views, indexes, clusters, synonyms, sequences, procedures, functions, packages, triggers, and so on)
* How much space has been allocated for, and is currently used by, the schema objects
* Default values for columns
* Integrity constraint information
* The names of Oracle Database users
* Privileges and roles each user has been granted
* Auditing information, such as who has accessed or updated various schema objects
* Other general database information
Not completely read-only as I found out once. What's the point? None really. I just like being able to copy and paste the documentation. OK, that's only partially true.

All the information that you retrieve via your nice GUI interface comes from these tables. SQL Developer, JDeveloper, Toad, etc. I believe some even have an option to view the SQL being submitted to the database. If you haven't used (knowingly) the Data Dictionary before, check it out. You can easily view all 651 views (10gR2) by issuing:
SELECT * FROM dictionary;
Happy hunting.

Oracle: Getting Started

A friend of mine recently expressed interest in learning Oracle.

Here's what I sent him:
Start here:

Oracle Concepts Guide -

You don’t have to read everything, but it’s probably the best place to start. I’d be more than happy to answer some of the more “inane” questions (How do I do this? How do I connect? Etc).

Oracle Documentation (List of Books) -

Don’t be overwhelmed by that. It’s ridiculous how much stuff there is out there. I list some books out by discipline:

--  2 Day DBA -
----  Easiest place to start (after the Concepts Guide)
--  Oracle Database Administrator’s Guide -
----  Definitely getting into the nitty gritty stuff here.
--  PL/SQL Language Reference -
----  That should help you get started with PL/SQL if you so choose

For either role, the bible is probably the SQL Language Reference manual:

Usually what I do, is just download the software and try to get it running. It’s super easy on Windows, a bit more difficult on *nix environments (I just recently installed Oracle on Oracle Enterprise Linux, my first go at a non-Windows environment).

The documentation above is for 11g, the latest version. You can get the software here: I’d suggest installing EE, might as well go full boat. Plus, it doesn’t really make a difference resource wise. If you have 1.5 to 2GB RAM, you should have plenty. You can run an instance at about 512MB.

It requires an account (free) and they do not spam.

Also, try Oracle Database XE, a free (as in beer) standalone edition built off of the 10.2 kernel. Application Express is used as the front end (a pretty sweet GUI tool).

Hope that doesn’t scare you off too much. ;)
Any other pointers? I know there are tons of books so link them up if you feel they are worthy.

Tuesday, July 7, 2009

Classic: Application Developers vs. Database Developers II

The original (with all the fun comments) can be found here. Originally posted on December 9, 2008. This is the "followup" to yesterday's post.

You can read the first article here. My application developer friend, Mr. M, emailed me and another fine gentleman this little blurb recently:

Mr. M:

Hey seriously, what a tub of shit Oracle is. Where does this myth come from that it's such a great platform? Their client tools suck balls and it's generally just a pain in the ass to work with from a developer's point of view. But devs for some reason are under this impression that from thew server perspective it's rock solid and performant. Well, it may be solid, but it's a fucking turd. Our dba here - definitely an Oracle guy - has been tasked with looking into moving us onto an oss db. He basically refuses to even look at MySQL, stating that it's a mickey mouse worthless pile of shit (can't really argue with him there lol), so that basically leaves Postgres. So it's been a few weeks now, and he will basically admit now that Postgres completely waxes Oracle as far as performance goes. LOL We run RAC in production too. He's looking at failover, replication, blah blah blah now, we'll see what the verdict on that is. Oh, and Oracle AQ? That's a worthless pile of shit too. Why do they tack a fucking message queue that doesn't play nice with jms onto the fucking database, which is probably already overworked? Oh wait, that's right, they're in the business of selling per cpu licenses! Cocksuckers.
This was prompted by a recent Oracle email blast about the Exadata storage system/Warehouse.

As I did before, I'll just put the email here.

Agreed, their client tools aren't all the great. Which ones are you using?

I use SQL*Plus (naturally), SQL Developer and JDeveloper. The latter 2 tie in very nicely with Subversion. With JDeveloper, when I want to try out Java, it's got a pretty good suite of tools.

Oracle starting out catering to businesses, Microsoft started with the consumer. Oracle has made pretty good strides in making their software more usable while Microsoft has made theirs more scalable.

I haven't used AQ a whole lot and definitely not with Java. I do know that it's billed as JMS compliant.

Postgres has it's place and so does Oracle. It is a great platform if you know how to leverage it. PL/SQL is a 3GL (or is it 4) and you can write entire applications using it...and if you do that right, an incredibly robust and scalable application at that.
Mr. M:

"It is a great platform if you know how to leverage it. PL/SQL is a 3GL (or is it 4) and you can write entire applications using it...and if you do that right, an incredibly robust and scalable application at that."

NO!!! NO!!! NOOOOO!!!

I want to beat people like you who say this with a ball pean hammer. There are only a select few individuals on this earth who can write and read application logic written in SQL. AVOID THIS ANTI-PATTERN AT ALL COSTS! What is it with you f_cking database guys??? Why do you always want to stuff as much crap into the db as possible?


It's fine for querying and manipulating sets of data, in a relational database. But it is a worthless sack of shit for expressing application logic!

I'm having to dig through this f_cking abortion right now because some Oracle f_ckhead thought "you can write entire applications using it!" Blog that, mofo!
This was followed by a package he had been working on. I wouldn't say it was the greatest, but it wasn't all bad either.

goodness gracious.


disagree (naturally). It's incredibly easy to do, you just don't know how yet...and it seems even the Oracle professionals out there don't either.

I'll tell you this, the crazier the SQL or PL/SQL needed to derive and manipulate data the poorer the design. Start with a good design and it all becomes very simple.
Of course note the use of "naturally" in my lexicon. Thanks Jake.

Mr. M:
well dude, we are back to our old discussion - you arguing that procedural sql code is perfectly fine for building apps, and by extension, that the last 20 years of computer science have basically been a misguided lost journey down the meandering, fruitless trail of oop. select this from that. otherwise keep that sql crap caged up where it belongs.

btw, do the db guys here suck? seriously. i'm not competent enough to judge. (to be fair, apparently that crap i sent you is fairly old stuff, but still....)
I would say, based on limited knowledge of software development, that the OOP movement was started because the database (specifically Oracle) was not mature enough to do what was needed. Plus, I seem to recall that the OOP movement was supposed to have solved all the world's problems by now.

It's further propogated due to the needs you specified in our discussion that day at WellCare (i.e. performance). I still believe that if application developers better understood how a DB works (Oracle, MySQL, SQLServer, etc) they would end up writing less code. Database constraints alone force you to write less (and better) code simultaneously ensuring good data.

The code that I did look at (first 1000 lines or so) isn't great.
1. With all those IF THEN ELSE statements it's telling me that there's probably a better way to store the data. Looks like they're missing an attribute that should be contained with a track.
2. using Object (PL/SQL) types to store data in the way they seem to be doing it is not the best way. Again, probably a design issue.
3. When you do something like this:

UPDATE pb_album_metadata
SET primary_digital_flag = 0
WHERE album_id IN (SELECT b.album_id
FROM (SELECT a.album_id AS album_id,
MAX(a.album_id) OVER (PARTITION BY a.standard_upc) AS latest_album_id
FROM pb_album_metadata a
WHERE a.standard_upc = g_album_tab(1).standard_upc ) b
WHERE b.album_id <> b.latest_album_id )
AND primary_digital_flag <> 0;

They should probably have considered end_date as an attribute of the album metadata. While analytic functions are pretty cool, they're more for analyzing (OLAP) and not OLTP environments.

That's for starters and without table definitions...
Me (again):
oh yeah...and PL/SQL is/was built on top of ADA, FYI.
Mr. M:
"I still believe that if application developers better understood how a DB works (Oracle, MySQL, SQLServer, etc) they would end up writing less code. Database constraints alone force you to write less (and better) code simultaneously ensuring good data."

Huh? What are we not understanding? What would be an example of a constraint that would force us to write less and better code?
CHECK, NOT NULL (same as CHECK) and FOREIGN KEY constraints all fit into that category.

Instead of having to check if a value is NULL when submitting it to the database, just submit and let the database throw an error, which you'll conveniently capture in your Exception block. Same goes for CHECK constraints, columnA should be either one of three values (test, test1, test3), you won't have to check that in your application code, just catch the exception. FKs insure good data (proper relationships).

A different perspective. If you are going to pay for it, use that damn thing. If you don't care, don't do it. But over the course of an "enterprisey" application, you'll end up writing more code and make maintenance that much more difficult (did I do it there?). Just those constraints will force you and the teams of application developers to put good data into the database.

You can still do it in the application of course (form validation is a good place)...
Mr. M:
Ahh, jeez dude, I wasn't sure if you were referring to the literal "constraint" or not.

Dude, even f_cksticks like redacted I think have a decent enough understanding of when and how to use db constraints. It's when you get into things like cursors or cost plans of subselects and anonymous tables (i think that's the name for it - where I say select something from (select some other crap). Then we defer to db gurus like yourself.

But dude....."you won't have to check that in your application code, just catch the exception".......uh, don't ever repeat that off our little email group. And definitely don't go posting that on your blog. F_ck me man, it's a damn good thing we keep you db folks caged up in that rdbms box....
So we've had this discussion at work...for a high transaction system, do Java/C/etc handle exceptions well or what?

Why is it bad to deal with exceptions rather than coding to avoid them?

I highly doubt even redacted understood database constraints...him and his cohorts believed all database software would be commoditized and MySQL would be king in short order.
Mr. M:
"for a high transaction system"

Or for any system really....

To use your example of check constraints (is this value one of....) or not null constraints, checking these rules in the Java code and in the database code would seem to violate DRY. But we do that alot, and it is acceptable in certain cases. For instance, we also probably violate DRY if we're checking this same rule say in Javascript on the front end. But we accept this tiny violation of DRY because it tends to make for a better user experience and as a performance gain too, for we avoid the round trip to the server. Now, what your advocating here is close to the same thing. You're basically saying, don't check for the not null constraint in Java code, just go ahead and hit the database, let the database throw back an exception to the Java code, presumably correct the problem, and then make another roundtrip to the database again. Dude, what are you thinking?!? This to say nothing of the fact that this also could be considered a violation of Fail Fast, and a violation of Defensive Programming - what happens if the dba forgot to add the not null constraint in production?

Dude, listen to this guy. For a "high transaction system" basically you want to treat the database, as much as you can, as just a dumb data holder. A f_cking dumpster, that you just throw sh_t into and pull shit back out, using no vendor-specific features if at all possible.

Again, for we've had this discussion, but even in everyday apps, not just on Wall Street, the database is the bottleneck. And the database is the hardest layer to scale. So given those facts, you should design your app to do as little work in the database as possible.
I was laughing at this point because the link above points to one of our consulting architects (I'm not really sure what his role is at this point).
i agree in any application that you want to minimize the number of round trips...

shocker...he's one of our architects. he's spot on in many instances, but...

database is the bottleneck because people don't know how to write SQL. I'll certainly concede the wall street applications (for the time being anyway), but the rest of us with what we do on a daily basis...Oracle will scale far beyond the demands they place. When that bottleneck shows up, 9 times out of 10 some dumb-ass c#/java guy thought he could write a better query than i. besides, what's the idiot doing anything but a simple join anyway? probably poor data model to start with...and we're right back where we started (sort of).
Mr. M:
"database is the bottleneck because people don't know how to write SQL.....some dumb-ass c#/java guy thought he could write a better query than i."

Dude, I'll grant you, people don't know how to write SQL, myself included. But that's not always why the database is the bottleneck. I think it's safe to say that's not even the majority of the reason. Yes, there are apps written by people who were just idiots, and they needlessly pummel the database, but that's definitely not the majority of scenarios. At my work the database is the bottleneck, and we run RAC in production. It's my understanding that even with RAC, there is a limit to how much you can scale that out. But any layer up from the database we are basically unlimited in how much we can scale that out. So it's very easy to stick another Apache box in front, or fire up another Weblogic server. But I can't do that with the database. We have 24 Weblogic servers for redacted. The database is the bottleneck. And we don't have shitty sql code in the app. In fact, we have very few hand-written queries anywhere in the app. Persisting something to a database is really a low-level concern that as an application developer I shouldn't even have to be bothered with, except for the rare corner cases where the persistence abstraction I'm using is just a little bit too generic to handle things effectively. And we don't use these ORMs because we don't know how to write sql. To effectively use an ORM tool requires a deeper understanding of sql and databases than simply being able to write solid SQL code. (Not saying Java devs who use ORMs know SQL better than a dba, just that it requires better sql skills than is required of a Java dev to simply write JDBC/SQL.) Now, before you try to tell me that my ORM library isn't able to write as efficient of sql code as a dba of your caliber, keep in mind that ORM tools are pretty advanced. They're able to intelligently do things like batch sql updates, and let you apply transactional semantics much easier than with raw jdbc/sql. But the overwhelming reason developers have so thoroughly adopted ORM is because Structured Query Language is such a nasty piece of shit for expressing application logic. SQL is a declarative, procedural language. It's totally unsuited for writing application logic! This, more than anything else, is why organizations and dev teams should seek to restrict what is handled within a relational database as much as possible - because the programming interface to it is a fucking ancient backward dying dinosaur.
Mr. V (note, not Mr. M):
My 2 canadian cents:
The polyglot approach "... use different languages for different domain ..."
Database was developed to manipulate data and should remain there.
General purpose language was developed to encapsulate logic and should remain in that domain.
You should not use DB to encapsulate business logic (in my opinion) no more than you would use HTML to create complex logic.

While Java, C#, Python, etc are described as General Purpose languages, they, too, are really domain-constrained. Their expressiveness are confined (and should be) to express low-level, tersed, explicit, verbose, and repetive logic (if that makes any sense). Languages such as these are more suitable for low-level abstraction on top of which, richer, more expressive languages can be built. We are seeing this now with the emergence of languages on the JVM (and arguably on the .Net's CLR).

I think SQL as we know will take a back seat and a smart developer somewhere will create a new domain-specific language that lives on the VM and will push the SQL expression out of the RDBMS and closer to the code that needs it. We are not there yet, but Microsfot is trying (see LINQ and all ORM papers). This is not to say that there won't be isntances where tuning in the SQL-native language won't be necessary. However, 80 to 90% of simple CRUD cases will be handled closer to the logic code that uses the data code.

Again, that's my 2 canadian cents... I could go on. But I have a meeting with redacted.
Mr. V, I believe, is a little bit more sensible. Mr. M on the other hand is just trying to rile (sp?) me up.

Someone will probably create something like that, but it still gets at the heart of one of my arguments, many developers don't know how to use a database thus will go to any means to circumvent it. Embrace it I say.

Ultimately for me, it comes down to simplicity. I can write an application using PL/SQL that will scale very well for 90% of the solutions. Of course that doesn't include the "fancy" javascript/Ajax stuff needed for the UI. In my ever so humble opinion, this is a much better solution for a business in the long run.
1. You're more likely to change the middle tier than the database tier. Java, asp, Ruby, what's next?
2. Fewer moving parts means lower cost of ownership, even with the CPU costs. One person, one, was able to build and maintain a 350 page website. Be hardpressed to do that with the more expressive languages.

I think all of us are purists and very passionate about what we do. It's probably the main reason we get along. I thoroughly enjoy these conversations because it does force me to think...and that's always a good thing.
Mr. V:
Haaa chet.
You sorta made my point than diverged away. Building an app in all PL/SQL is dangerous. It's no safer than me building an app in all Java. I can build very limited app in all Java. The moment I need to interact with other domain (UI, data, low-level native code, etc), I have to switch to something that is closer to what I am trying to do. If I need to create UI, I will pick a ui-centric environment, when I need to talk to DB, I will pass connect to a db and send in my SQL, and so forth. I will use Java as the glue to bring it all togher.

In the end, we may be saying the same thing, but using different accent. O well.
And that's where it ended. I must say it's always fun. Mr. M and Mr. V are both very smart individuals and I highly respect what they do. We have different perspectives...but I think they listen, if only a little, as I listen to them. Their voices creep up on me especially now...which is definitely a good thing.