In a previous post, I pondered the decision of LOOPing (i.e. PL/SQL calls to a lookup). Given 8 days to code and a 90 hour week last week I think I came up with a decent solution.
Initially I went the easy route, writing PL/SQL functions that would populate collections and then do the matching. I tried to tune the functions so that the collection would be populated only once for each line of business and then ordering the result set that would be passed through. In my mind, this would reduce the number of cursors opened. I wasn't if that would make a difference or not, but I was going to give it a try.
Well, it was dog slow. One of the lookups calls was estimated to take days to complete (I only know this because I have gotten into the habit of using dbms_application_info.set_session_longops).
So I rethought it.
I only needed to get the keys right?
I first created a primary key on the staging table (same key that would be used in the dimension). I then created a "work" table that would reference that key and then store the other key that I needed to lookup. I also made the foreign key a unique key (one to one relationship) so that I would know immediately if I did something wrong. I then used straight SQL to populate that table. One pass for all the values that matched and then another pass with those that didn't have a matching record (padded rows).
From days to minutes...2 minutes in fact. So I applied that technique to two other lookup situations.
The code is still pretty straight-forward and adding another line of business is relatively easy (I checked this today and it took all of 10 minutes). All the fancy PL/SQL and collection stuff I had created was thrown out the window.
So I did manage to use straight SQL, the performance is phenomenal and maintenance will be easy.
It even passed muster (albeit grudgingly) with our more seasoned datawarehouse folks. Ultimately, they couldn't complain when it ran in under two hours...