Wednesday, September 26, 2007

To LOOP or Not To LOOP - Revisited

In a previous post, I pondered the decision of LOOPing (i.e. PL/SQL calls to a lookup). Given 8 days to code and a 90 hour week last week I think I came up with a decent solution.

Initially I went the easy route, writing PL/SQL functions that would populate collections and then do the matching. I tried to tune the functions so that the collection would be populated only once for each line of business and then ordering the result set that would be passed through. In my mind, this would reduce the number of cursors opened. I wasn't if that would make a difference or not, but I was going to give it a try.

Well, it was dog slow. One of the lookups calls was estimated to take days to complete (I only know this because I have gotten into the habit of using dbms_application_info.set_session_longops).

So I rethought it.

I only needed to get the keys right?

I first created a primary key on the staging table (same key that would be used in the dimension). I then created a "work" table that would reference that key and then store the other key that I needed to lookup. I also made the foreign key a unique key (one to one relationship) so that I would know immediately if I did something wrong. I then used straight SQL to populate that table. One pass for all the values that matched and then another pass with those that didn't have a matching record (padded rows).

From days to minutes...2 minutes in fact. So I applied that technique to two other lookup situations.

The code is still pretty straight-forward and adding another line of business is relatively easy (I checked this today and it took all of 10 minutes). All the fancy PL/SQL and collection stuff I had created was thrown out the window.

So I did manage to use straight SQL, the performance is phenomenal and maintenance will be easy.

It even passed muster (albeit grudgingly) with our more seasoned datawarehouse folks. Ultimately, they couldn't complain when it ran in under two hours...

Wednesday, September 19, 2007

To CONSTRAINT or Not to CONSTRAINT

I've been having these long (good) drawn out conversations with a colleague of mine recently about constraints in a datawarehouse environment. Since I come from an OLTP environment, I want to put a constraint on everything. He on the other hand wants any constraints enforced through code.

I can understand that constraints my slow down table loads, but with the volume of data we are currently using (100 million row tables), I just don't see that as a huge impact on performance. For me, writing less code is better than any minimal performance gains we might achieve.

For instance, today he told me he wanted to add a new column. The possible values would either be M or G. I asked (I'm lead on the project) him to throw a CHECK constraint on the table limiting the values that can go in that field.

For me, I guess it's a support issue and part database purist issue. Use it for more than a bucket, Oracle's expensive. It forces developers to deal with that and will immediately tell them if they have done something wrong. It also makes support a tad easier as they won't have to wonder what actually goes in the field.

We talked at length about it but never really came to a decision (i.e. I couldn't convince him).

I know that a benchmark test would be the best way to prove either way, but I don't have the time right now to do it. Perhaps when things slow down again I will.

Any suggestions out there?

Tuesday, September 18, 2007

Zero Day is NOT Upon Me

I got a brief reprieve today, I guess.

Neither of my two deployments are going through tomorrow, so I have two more weeks to go until another opportunity to screw something up. ;)

Speaking of which...
I watch SpongeBob Squarepants quite a lot with a 5 year old boy in the house. I was watching this episode the other day where Patrick (the starfish and best buddy to SpongeBob) found out his parents were coming over to visit him.

He was not very happy as his parents treated him like he was stupid. SpongeBob told him that he would help out by acting even dumber than Patrick thus making Patrick actually look smart. The act worked, but too well. Patrick, even out of earshot of his parents, kept cracking stupid jokes at the expense of SpongeBob. This resulted in SpongeBob getting mad and reminding Patrick that he was only acting stupid to help him out.

Since I put my sign up (not counting my counter up top, no one here gives me a hard time) everyone has consistently beat me up over it. Today was it for me. Yes, I brought it on myself with the sign and all that, but I'm through with it. My boss mentioned that if I had to reset my sign that I wouldn't really be there to do so as I would have been escorted out the door (he didn't say this in a mean-spirited way...he might be right next to me) and unable to reset.

I started practicing how I would answer questions about my firing in my next interview.

Interviewer: "So, why did you get fired?"
Me: "Well, I screwed up production a couple of times."
Interviewer: "How did you do that?"
Me: "The first time I deployed code from development to production."
Interviewer: "Really?! Why did you have access to production?"
Me: "..."
Me: "On the second occurence, I deployed a bug into production; I changed a non-requirement driven piece of code"
Interviewer: "What about QA? Didn't they do regression testing?"
Me: "..."

I felt a little better.

I've have never been happy about my mistakes, I tried to "man-up" and own them; perspective and context are good things though. I should learn from my mistakes and correct the behavior that led to them, that's all I can really do.

Thursday, September 13, 2007

To LOOP or Not to LOOP

I've already established my creds as a Tom Kyte follower.

Tom's mantra:

  • You should do it in a single SQL statement if at all possible.

  • If you cannot do it in a single SQL Statement, then do it in PL/SQL.

  • If you cannot do it in PL/SQL, try a Java Stored Procedure.

  • If you cannot do it in Java, do it in a C external procedure.

  • If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it…


I'm faced with a new datawarehouse load, eventually it will be about 100 million records. If I go the straight SQL way (embedded in PL/SQL), I'll need to create multiple objects to support that method and each time we add a new line of business, we'll have to add more objects and more code.

If I can encapsulte it in PL/SQL, I could eliminate most of the necessary supporting objects, but at a big cost, performance. It would make coding time much quicker and ultimately more maintainable (in my opinion), adding a new line of business would simple by updating the packaged functions with an additional control check.

It's a difficult pill to swallow though. I'm originally an OLTP guy and I can write pretty good SQL. Being so new to datawarehousing concepts I feel like I am cheating somehow. Making that decision of when to move to slow-by-slow seems exceedingly difficult. Hopefully with more experience in the datawarehouse environment it will become easier...but it's still tough.

The Chicken Almost Came Home to Roost

We have a daily meeting with the Business concerning the current, long running project. This is the one I have screwed up a couple of times.

I have a scheduled deployment next week, mainly just performance improvements. I've managed to get this down from 10-12 hours to 7-8.

At the end of the meeting, the Business questions the need for the performance improvements - we're re-architecting my solution in parallel because mine was just a conversion from SAS (yuk!) - as we'll be live in just 2 months with the new one.

Obviously, I know why they're thinking that.

IT, me, can't be trusted to do it right. Since I've, umm...screwed up a couple of times. Why do this if it's not broken?

I just hung my head low, I knew, and I couldn't really argue with them. I had no ground to stand on. Our PM said that he'd take it back to our manager and let them know.

I went to lunch and thought about this blog entry I would be writing.

I got back from lunch and my PM informed me that this would be going into production...the Business' big boss, Miss VP, said so.

Woohoo! Someone has some semblance of confidence in IT (me)!

I've done everything I could to make sure that I didn't do something silly. We had a peer code review to compare the most recent build against that which was in production. My unit tests were much more thorough. I worked with QA to get them to look at specific points. Let's just hope all goes well.

This is a big test for me. Either I pass and gain some credibility back or fail and lose my job. Wish me luck!

Code Style

Having a specific coding style has become more apparent to me now that I work in a team environment. I have my way, which is of course the best, and everybody else has there way, which isn't nearly as good.

I'm sure if I could make the rules, I'd have everyone writing code according to my ways. Power corrupts and all that.

I have learned to accept others' ways though. It's been difficult at times. If the code is readable and it works, I usually just bite my tongue.

Strangely, I love to type. I manually create all of my scripts. I make my code "pretty." I don't use GUI tools to auto-generate table definitions, and I especially don't use tools to format my code.

Toad Formatter, or whatever it's called, is my enemy. I'm sure Toad is not the only one with some sort of auto-formatter and I don't like any of them. For some reason I believe it is an abomination.

When I get to do the technical interview, I scan their resume for SQL*Plus. If I don't see it, I ask them what tools they use. If they don't mention SQL*Plus, I'm skeptical of their abilities...until I get to the five constraints question.

This is not to say that people that use Toad or SQLDeveloper are bad coders or anything, it's just my preference.

So what's your preference?

Tuesday, September 11, 2007

Inline vs. Out-of-line Constraints?

Today one of my more feisty colleages and I had a discussion about constraints. I had asked him if he wouldn't mind naming the NOT NULL constraints to a particular table definition. With this guy, nothing is easy. ;-)

So he came over and we talked about inline vs. out-of-line constraints. I asked him the advantage of out-of-line. He asked me the advantage of inline. I'm sure his answer was better than mine as he is much more articulate.

For me, it's mostly a style thing, except for FOREIGN KEY constraints which, if defined inline, will inherit the data type of the parent column. That makes life easier if you ever need to change the data type of a parent key (not a recommended best practice mind you). I just think it looks prettier (factual based evidence). It's all in one file and I like to see how many constraints I can put on a single table. I'm shooting for at least one per column to save myself time down the road of coding exceptions. Let the database do it's job!

Here's a sample of inline constraints:
DROP TABLE s;
DROP TABLE t;

CREATE TABLE t
(
id NUMBER(10)
CONSTRAINT pk_id PRIMARY KEY,
name VARCHAR2(30)
CONSTRAINT nn_name_t NOT NULL,
age NUMBER(2,0)
CONSTRAINT nn_age_t NOT NULL
CONSTRAINT ck_gtzero_age_t CHECK ( age >= 0 )
);

CREATE TABLE s
(
sid NUMBER(10)
CONSTRAINT pk_sid PRIMARY KEY,
id
CONSTRAINT fk_id_s REFERENCES t ( id ),
something_unique VARCHAR2(30)
CONSTRAINT uq_somethingunique_s UNIQUE
);

And out-of-line Constraints:

DROP TABLE s;
DROP TABLE t;

CREATE TABLE t
(
id NUMBER(10),
name VARCHAR2(30),
age NUMBER(2,0)
);

ALTER TABLE t ADD CONSTRAINT pk_id PRIMARY KEY ( id );

ALTER TABLE t ADD CONSTRAINT nn_name_t CHECK ( name IS NOT NULL );

ALTER TABLE t ADD CONSTRAINT nn_age_t CHECK ( age IS NOT NULL );

ALTER TABLE t ADD CONSTRAINT ck_gtzero_age_t CHECK ( age >= 0 );

CREATE TABLE s
(
sid NUMBER(10),
id NUMBER(10),
something_unique VARCHAR2(30)
);

ALTER TABLE s ADD CONSTRAINT pk_sid PRIMARY KEY ( sid );

ALTER TABLE s
ADD CONSTRAINT fk_id_s FOREIGN KEY ( id ) REFERENCES t ( id );

ALTER TABLE s
ADD CONSTRAINT uq_somethingunique_s UNIQUE ( something_unique );


So, what are the advantages for either method? I couldn’t find much via google or asktom, but I probably just didn't search on the right terms. I find it hard to believe that this topic hasn't come up before.

* Note that I did use 5 constraint types

The Good Manager

I read Lewis Cunningham's article today What Keeps you at your Job?

This is something I think about quite often. I am in a very chaotic, immature organization currently. The process to deploy code changes about every other day and of course non of it is documented. Then there's the fact that I had complete control at my previous job, I was the DBA, Architect, Web Developer and Designer (suck at that), and most importantly Database Developer. I had a very good manager who just literally let me run wild (within reason of course).

For me, that was a perfect situation. I felt I was under-utilized at my previous job and that was the perfect opportunity to flex my muscles. I learned a great deal there and I am forever thankful for that.

One of the big reasons I took my current job was because of the chaos and the immaturity of the IT organization. There are countless opportunities to help shape the future, to build the foundation. I'd also get to experience life in the for-profit corporate world where performance is rewarded financially. There's also significant room to advance relatively quickly compared to more established environments.

I have learned things on the technical side, but far and away my biggest gain in knowledge is in how to do software development in a team environment and the peculiar politics of a company.

I have my manager to thank for that. He is a former military officer who attended one of the military academies. He has worked in our industry for a number of years and is our subject matter expert on the financial side of things.

  • He gives us (developers) the opportunity to voice our opinions.
  • He gives us a view into the politics.
  • He gives us the big picture view.
  • He is fair.
  • He does not do things just because that's the way they're done. He fights those battles so that we don't have to do it the wrong way.
  • He backs us up.
  • Shit doesn't roll downhill with him.
To me, those are all terrific qualities. When I've screwed up, he tells me; usually though, he asks me questions so that I will come to the realization. He's been an outstanding leader and most importantly (to me anyway), a teacher.

If he ever decided to leave, I might just have to follow him.

The Countdown Timer

So I found a handy little countdown timer from this site so that I can replicate my hand made sign in my cube. This all stems from a previous incident.

Now I'll have a reminder at work and on the blog not to touch that which is out of scope.

Monday, September 10, 2007

Oracle Tools I've Used

Tools:
SQL*Plus
APEX - Application Express (formerly HTMLDB)
JDeveloper
SQL Developer
Reports Builder
OC4J
Discoverer
Oracle Server 8i, 9i, 10g (Windows)
Oracle Application Server 9i, 10g (Windows)
rman
oradim
lsnrctl
tkprof

Coming Soon:
Oracle Warehouse Builder

Database Features:
Java
Object Types
Workflow
Advanced Queueing (in conjunction with Workflow)
Heterogenous Services

Features I'd like to use:
Change Data Capture
XML
Spatial
interMedia
Regular Expressions

Database Supplied Packages:
DBMS_OUTPUT
DBMS_LOB
UTL_FILE
DBMS_APPLICATION_INFO
DBMS_CRYPTO
DBMS_UTILITY
DBMS_METADATA
DBMS_EPG
DBMS_HS_PASSTHROUGH
DBMS_JAVA
DBMS_JOB
DBMS_LOCK
DBMS_MVIEW
DBMS_OBFUSCATION_TOOLKIT
DBMS_RANDOM
DBMS_SESSION
DBMS_SQL
DBMS_STATS
DBMS_XDB
DBMS_XPLAN
HTMLDB_APPLICATION
HTMLDB_ITEM
HTMLDB_UTIL
HTP
OWA_COOKIE
UTL_MAIL
UTL_RAW
UTL_SMTP
WPG_DOCLOAD

Packages I'd like learn to use (10g):
UTL_DBWS
DBMS_CDC_PUBLISH
DBMS_CDC_SUBSCRIBE
DBMS_DATA_MINING
DBMS_DATAPUMP
DBMS_SCHEDULER
DBMS_OLAP
DBMS_PIPE
DBMS_STREAMS
DBMS_STREAMS_ADM
DBMS_STREAMS_AUTH
DBMS_STREAMS_MESSAGING

Wednesday, September 5, 2007

Working with the Business

I was once one of those annoying Business users who constantly pestered the IT department.

Probably one of the biggest reasons I got into IT was because I didn't like the control that IT (seemingly) had over me. I wanted that kind of control, but I also wanted to share it.

I have tried to carry my end user mentality with me to IT. I try and go out of my way to learn the business; from reading procedures to just sitting down with them and discussing what they do. This was fairly easy to do in the smaller companies I've worked with, but now that I am in a 3,000+ person company, it has gotten a bit more difficult.

I read an article by Steve Jones about getting closer to your business. I couldn’t agree more.

I would love (and have asked) to be able to spend more time with the Business folks. Perhaps shadowing them for a week, or just writing reports so I can get a better feel for what they do with the data/information we provide them. Better yet, make it part of the on-boarding for new employees. Spend 6 weeks with various departments or something like that. Some companies have management programs that do just that.

Anyway, I believe time spent with the Business makes me a better developer and hopefully builds up a good working relationship (trust) between IT and the Business.

Business Logic: In the Database or in the Application

When I started in the Data Warehouse, I began subscribing to as many BI/Data Warehousing blogs that I could find (there don't seem to be lot of them). Intelligent Enterprise (Roger Kimball) and Bill Inmon seem to be the most popular or well known. I also read Mark Rittman (and Mr. Mead), David Aldridge, Lewis Cunningham and Dratz who all seem to have more of an Oracle lean.

Today I read a post from Dan Linstedt titled "How Data Models can Impact Business." I followed the rather lengthy, but descriptive post, until I got to this part:

"Ok, I kind-of buy it, but what about Referential Integrity, when should that be enforced?
In two places:
1) When the data is captured within the application capturing it - it would clean up a LOT of these source systems, and put much more rigorous business logic (and cleaner data) into the source systems to begin with.
2) When the data is "released" for users, to reports, to screens, to output. This is when reusable common services / routines for getting data out are helpful. They implement the referential integrity in the application layer."

Now, I haven't been in IT all that long (5+ years now), but I put as much business logic into the database as I possibly can. I use the front end (APEX, woohoo!), to handle row color or something along those lines.

If you start with a good, flexible model, have a strong database team (DBAs, Developers) and you build a good API to your physical model (no INSERT, UPDATE or DELETE to any users, the only entry point to your tables being your API), flexibility and maintainability should not be a problem. Making changes should not entail a monumental effort. I suppose if the application in Mr. Linstedt's article is the only point of entry into the tables, I probably wouldn't really disagree (he's just moved it from my database API to his application), but then you have a giant bucket. Why not just use what you have paid for and build it in the database? That way, more than one application can use the same API over and over.