ORACLENERD
 
Thursday, August 28, 2008
  Jobs: ETL Developer
This position is in Tampa, Florida. If you are interested you can email me at chet dot justice at gmail.

Position Description:
· Develop and implement applications that support the business strategy of measuring and optimizing performance
· Support existing data warehouse applications and users

Mandatory Skills/Experience
· 2+ years experience working on data warehouse/data integration solutions
· 2+ years working with vendor ETL tools to design/build/test (Business Objects Data Integrator XIR2)
· 1+ years working with Oracle database; good PL/SQL knowledge
· Experience with dimensional data models (star schemas)

Preferred Skills/Experience
· Java programming experience
· Data Quality XIR2
· SQL Server 2005/SSIS
· Business Objects Enterprise XIR2
· Crystal Reports XIR2

Labels: , , ,

 
Wednesday, July 9, 2008
  Pseudo Column ROWDATE?
I'm working a little bit in the datawarehouse again (by the way, is it two words or one?). A technical guy from Oracle showed up today to help us decide the best way to move forward on capturing changes. There were essentially 4 methods:

1. SQL and PL/SQL
2. Streams (CDC)
3. OWB/ODI
4. Logical Standby/Data Guard

As someone on the phone was talking, I started to wander...hmmm...what about some kind of pseudo column for that stored the last update (either INSERT or UPDATE) of a row?

Off the top of my head, I can think of rowid, rownum, and level. I'm sure there are others (feel free to comment).

Oracle, I'm sure, stores that information some where right?

I hadn't really given a thought as to the feasibility or the impact it might have, but that would make capturing changes a whole lot easier...

Labels: ,

 
Wednesday, May 21, 2008
  Multi Table INSERTs
I learned something new today! Multi Table INSERTs.

I was trying to capture the new records into a staging table using RETURNING BULK COLLECT INTO, but that doesn't work (yet).

I remembered reading about Multi Table INSERTs, but never had the need for using them, so never bothered to learn it.


CREATE TABLE t
(
x NUMBER PRIMARY KEY,
y NUMBER,
z NUMBER
);

CREATE TABLE s
(
x NUMBER PRIMARY KEY
);

INSERT ALL
INTO t ( x, y, z )
VALUES ( myrownum, ran1, ran2 )
INTO s ( x )
VALUES ( myrownum )
SELECT
rownum myrownum,
dbms_random.value ran1,
dbms_random.value ran2
FROM dual
CONNECT BY LEVEL < 101;

CJUSTICE@ORA10GR2>SELECT COUNT(*) FROM t;

COUNT(*)
----------
100

1 row selected.

Elapsed: 00:00:00.01
CJUSTICE@ORA10GR2>SELECT COUNT(*) FROM s;

COUNT(*)
----------
100

1 row selected.

Elapsed: 00:00:00.02


Awesome!

Labels: , ,

 
Monday, May 19, 2008
  DIMENSION Objects
One of the very first things I did when I got my current job was to go through the Datawarehousing guide in the Oracle documentation.

Analytics, check.
Materialized Views, check.
OLAP, interesting, but not applicable yet.
Dimensions...hmm, dimensions.

I looked at all the objects in the database but couldn't find any of the type DIMENSION.

I searched AskTom, nothing (or at least nothing I could find).

I can't remember finding any articles on the DIMENSION objects either. Can that be right? Does any one our there use them? Mr. Rittman? Mr. Aldridge? Mr. Scott?

From my brief perusal of the documentation, you can set up the DIMENSION like this:

CREATE TABLE test_dim
(
test_key NUMBER(10,0) PRIMARY KEY,
source VARCHAR2(10) NOT NULL,
source_subcategory VARCHAR2(10)
);

INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 1, 'MEDICAID', NULL );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 2, 'MEDICARE', NULL );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 3, 'SSA', 'MEDICARE' );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 4, 'SPAP', 'MEDICARE' );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 5, 'MAPD', 'MEDICARE' );

CREATE DIMENSION test_dim
LEVEL source IS ( test_dim.source )
LEVEL source_subcategory IS ( test_dim.source_subcategory );
My best guess is that it's sort of like the heirarchical table format (FK referencing PK from the same table).

Does anyone out there have any experience using them?

Labels: ,

 
Friday, April 18, 2008
  Datawarehouse: Testing
Until recently, I have been doing support/enhancements and new development for one particular project.

In our development environment, I have essentially been doing full load testing, using all of the records that are in production.

Needless to say, our development iterations were slow. Make a change to the code and then reload some 100 million records which are bounced against one another. The good thing is that I've picked up some really good tuning skills. The bad thing is my project was recently 2 months late.

Now that I've had a chance to breathe, I've been able to rethink some of the processes. First and foremost being how we test (both in development and QA).

If I had taken the time one year ago to build a "build" and "teardown" script, I probably could have shaved months of the project life cycle.

I used SQLUnit a few years ago and found that to be a pretty good tool. At the time, it consisted of writing out XML files and was quite time consuming, but gave me more confidence in the code I wrote. It also taught me to think like a tester forcing me to write better code as I could predict much easier where the problems might be.

We've used no such tool at WellCare.

I believe database testing to be extremely difficult. Our unit tests have consisted of running the procedures and then pasting that to a Word document...not really my idea of good testing, but it's what we have.

So are there any tools that you use to test database specific items? If you don't use tools, how do you do it? Why does database testing seem so difficult?

Labels: , ,

 
Tuesday, January 15, 2008
  Asynchronous Distributed HotLog - CDC Part IV
Part I, II and III.

For the time being I am throwing in the towel. I managed to patch my 10.2.0.1 version up to 10.2.0.3 thinking that might help, but it did not.

I'm still leaning to something on the 9.2.0.6 side, the only reason being is that in the trace file generated by the 10g instance, I see a "Alter9iSource" line in there...but there is nothing in the alert log on the 9i instance.

So I'm packing it up for a little while, hopefully the time away will do me some good and help me regain focus.

I did find a great example of the manual setup by Lewis Cunningham; that may be the route in the future, just a bit more work to maintain.

Labels: , , , , ,

 
Wednesday, September 26, 2007
  To LOOP or Not To LOOP - Revisited
In a previous post, I pondered the decision of LOOPing (i.e. PL/SQL calls to a lookup). Given 8 days to code and a 90 hour week last week I think I came up with a decent solution.

Initially I went the easy route, writing PL/SQL functions that would populate collections and then do the matching. I tried to tune the functions so that the collection would be populated only once for each line of business and then ordering the result set that would be passed through. In my mind, this would reduce the number of cursors opened. I wasn't if that would make a difference or not, but I was going to give it a try.

Well, it was dog slow. One of the lookups calls was estimated to take days to complete (I only know this because I have gotten into the habit of using dbms_application_info.set_session_longops).

So I rethought it.

I only needed to get the keys right?

I first created a primary key on the staging table (same key that would be used in the dimension). I then created a "work" table that would reference that key and then store the other key that I needed to lookup. I also made the foreign key a unique key (one to one relationship) so that I would know immediately if I did something wrong. I then used straight SQL to populate that table. One pass for all the values that matched and then another pass with those that didn't have a matching record (padded rows).

From days to minutes...2 minutes in fact. So I applied that technique to two other lookup situations.

The code is still pretty straight-forward and adding another line of business is relatively easy (I checked this today and it took all of 10 minutes). All the fancy PL/SQL and collection stuff I had created was thrown out the window.

So I did manage to use straight SQL, the performance is phenomenal and maintenance will be easy.

It even passed muster (albeit grudgingly) with our more seasoned datawarehouse folks. Ultimately, they couldn't complain when it ran in under two hours...

Labels: , , ,

 
Wednesday, September 19, 2007
  To CONSTRAINT or Not to CONSTRAINT
I've been having these long (good) drawn out conversations with a colleague of mine recently about constraints in a datawarehouse environment. Since I come from an OLTP environment, I want to put a constraint on everything. He on the other hand wants any constraints enforced through code.

I can understand that constraints my slow down table loads, but with the volume of data we are currently using (100 million row tables), I just don't see that as a huge impact on performance. For me, writing less code is better than any minimal performance gains we might achieve.

For instance, today he told me he wanted to add a new column. The possible values would either be M or G. I asked (I'm lead on the project) him to throw a CHECK constraint on the table limiting the values that can go in that field.

For me, I guess it's a support issue and part database purist issue. Use it for more than a bucket, Oracle's expensive. It forces developers to deal with that and will immediately tell them if they have done something wrong. It also makes support a tad easier as they won't have to wonder what actually goes in the field.

We talked at length about it but never really came to a decision (i.e. I couldn't convince him).

I know that a benchmark test would be the best way to prove either way, but I don't have the time right now to do it. Perhaps when things slow down again I will.

Any suggestions out there?

Labels: , ,

 
Thursday, September 13, 2007
  To LOOP or Not to LOOP
I've already established my creds as a Tom Kyte follower.

Tom's mantra:

I'm faced with a new datawarehouse load, eventually it will be about 100 million records. If I go the straight SQL way (embedded in PL/SQL), I'll need to create multiple objects to support that method and each time we add a new line of business, we'll have to add more objects and more code.

If I can encapsulte it in PL/SQL, I could eliminate most of the necessary supporting objects, but at a big cost, performance. It would make coding time much quicker and ultimately more maintainable (in my opinion), adding a new line of business would simple by updating the packaged functions with an additional control check.

It's a difficult pill to swallow though. I'm originally an OLTP guy and I can write pretty good SQL. Being so new to datawarehousing concepts I feel like I am cheating somehow. Making that decision of when to move to slow-by-slow seems exceedingly difficult. Hopefully with more experience in the datawarehouse environment it will become easier...but it's still tough.

Labels: ,

 
Google



How To
Parallel Processing: DBMS_JOB
SAS: Create Dataset From Oracle Table
Instrumentation: DBMS_APPLICATION_INFO
DBMS_CRYPTO

Popular
AppDev vs DataDev
Code Style Index
Better than Tom Kyte?
Good Day to Worse Day

Archives
August 2007 / September 2007 / October 2007 / November 2007 / December 2007 / January 2008 / February 2008 / March 2008 / April 2008 / May 2008 / June 2008 / July 2008 / August 2008 / September 2008 /


 

Powered by Blogger

Aggregated by OraNA