ORACLENERD twitter/oraclenerd view chet justice's profile on LinkedIn feed oraclenerd t-shirts Unemployment Clock:

  Connect to HP Neoview using Heterogenous Services
Being the lazy sort that I am and not wanting to pay for a SQL client, I decided to use my local Oracle instance to access a HP Neoview database. Please don't ask me any questions about it, because like much of the world, I don't know either.

Hat(s) off to Tak Tang. I've used his guide a number of times throughout the years. If he blogs, I can't find it, so if you know about it, please link him up. The first time I used it was back in aught (sp?) 5 to connect to a DB2 instance.

Setup
Database (source): Oracle 10gR2
OS: Windows Vista Ultimate running as a VirtualBox Guest on Ubuntu
Database (target): HP Neoview 2.4 (or something)
OS: Doesn't matter

Materials
HP Neoview driver which can be found here. (Thanks Christian)

Steps
Download the HP Neoview driver and install it. Since it is windows, just click away accepting all the defaults.

Next, configure an ODBC Datasource. I won't go into the gory details, but it's pretty easy. When you are done, you should see something like this:

ODBC Datasource

Now comes the fun part.

In your <ORACLE_HOME>\hs\admin directory you should see the following:

HS Folder

Open up the inithsodbc.ora file and set the parameters as follows:
HS_FDS_CONNECT_INFO = NEOVIEW
HS_FDS_TRACE_LEVEL = 1
Save the file as initNEOVIEW.ora in the same directory.

Now traverse to <ORACLE_HOME>\network\admin and open up your listener.ora file. Add a SID_DESC so that your file now looks like this:
LISTENER=
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521))
(ADDRESS=(PROTOCOL=ipc)(KEY=extproc))))
SID_LIST_LISTENER=
(SID_LIST=
(SID_DESC=
(GLOBAL_DBNAME=testing)
(ORACLE_HOME=c:\oracle)
(SID_NAME=testing))
(SID_DESC=
(SID_NAME=plsextproc)
(ORACLE_HOME=c:\oracle)
(PROGRAM=extproc))
(SID_DESC=
(SID_NAME=NEOVIEW)
(ORACLE_HOME=c:\oracle)
(PROGRAM=hsodbc)))
Reload or stop and start your listener.

Now open up your tnsnames.ora file and add an entry for NEOVIEW. It should look like this:
NEOVIEW =
(DESCRIPTION=
(ADDRESS_LIST=
(ADDRESS=
(PROTOCOL=TCP)
(HOST=localhost)
(PORT=1521)
)
)
(CONNECT_DATA=(SID=NEOVIEW))
(HS=OK)
)
The last step is to create a database link.
CREATE DATABASE LINK neoview
CONNECT TO "username"
IDENTIFIED BY "password"
USING 'NEOVIEW';
You should be all set now. You can test it out by issueing a simple
SELECT * FROM dual
If that works, you're done.

If you have problems, check out Tak's Troubleshooting section.

Labels: , ,

 
  Baseball Data Modeling
Anyone out there like baseball? Ever had a desire to model a baseball game?

I do and I have tried a few times in the past. It get's pretty hairy down at the game game/inning/player level. If I remember correctly, substitutions tripped me up a bit. There there's the whole datawarehouse side, I'd like that to be part of the project as well.

I started a project on Google Code here, the name is pretty vanilla, baseball-database. If you join you can suggest a better name.

I'd like to try and talk Oracle into giving a few licenses to the recently released production version of SQL Developer Data Modeler. I've hit up @krisrice on Twitter, but he has no control over licenses, just development. I've also hit up Justin Kestelyn (@oracletechnet) who said he would look into it.

I've had no time lately to bother him; perhaps with a few more people...

If you are interested, just drop me a line chet at oraclenerd or message me through twitter.

Labels: , , ,

 
  What's the Purpose of a Datawarehouse?
I've been meaning to write this since I first began my foray into the world of datawarehousing back in December of 2006.

What is the purpose of a Datawarehouse?

  1. Reporting

    • Key Performance Indicators (KPI)

    • Performance - Data is stored in such a way (facts and dimensions) which enables higher performance against the same relational tables in an OLTP system.

    • Slicing and Dicing - Viewing the data from multiple angles (dimensions) is the one of the best aspects of the Datawarehouse. Also included is the ability to drill down into the finer grained details.

  2. Single Source of Truth - Especially important in an enterprise environment which may have many systems of record. The datawarehouse allows you to aggregate all those systems into one environment. Not always completed, it's more of an ideal than anything.

  3. Clean Source Data (Transformation) - The datawarehouse allows you to "fix" the sub-par OLTP systems. Think garbage in, garbage out.

From my point of view as a database developer, #3 is the most important. I believe that the "Transformation" stage of ETL should be easy if only the source system had been designed properly. I've seen instances where a column (VARCHAR2(30) for example) meant one thing at one point in time and another at a different point in time. Or a data was stored in the column as MM/DD/YYYY, DD-MON-YY or Month DD, YYYY. How do you clean that up?

I also recognize that in an enterprise environment you can have the best designed OLTP systems but you still have to map the data to one field.

Ultimately, I think #1 should drive the design of all OLTP systems, making concessions where needed for performance. From there, your datawarehouse should flow fairly naturally. The ability to report on the data that you gather allows the business to make decisions for the future. In other words, it's the data stupid.

Labels:

 
  How To Populate Your TIME Dimension
Yesterday I wrote about the Datawarehousing TIMES table. Today I populated it. I modified it for my owner purposes naturally, specifically removing the fiscal components (thankfully calendar year equals fiscal year). I think that saved me days of figuring out how to calculate certain fields.

I remember the first time trying to do this took about 20 different SQL statements. I generated the key (date, time_id) and then calculate that either loop or perform another SQL statement using specific dates out of the table (last day of the month for instance). This time I managed to do it in a single SQL statement. Again, I left out some of the more complicated calculations since I did not have to worry about fiscal requirements.

I also got a short refresher course in the date format models which always come in handy and learned a new function, NEXT_DAY.
NEXT_DAY returns the date of the first weekday named by char that is later than the date date. The return type is always DATE, regardless of the datatype of date. The argument char must be a day of the week in the date language of your session, either the full name or the abbreviation. The minimum number of letters required is the number of letters in the abbreviated version. Any characters immediately following the valid abbreviation are ignored. The return value has the same hours, minutes, and seconds component as the argument date.
Here's the SQL to populate your TIME Dimension:
SELECT 
TRUNC( sd + rn ) time_id,
TO_CHAR( sd + rn, 'fmDay' ) day_name,
TO_CHAR( sd + rn, 'D' ) day_number_in_week,
TO_CHAR( sd + rn, 'DD' ) day_number_in_month,
TO_CHAR( sd + rn, 'DDD' ) day_number_in_year,
TO_CHAR( sd + rn, 'W' ) calendar_week_number,
( CASE
WHEN TO_CHAR( sd + rn, 'D' ) IN ( 1, 2, 3, 4, 5, 6 ) THEN
NEXT_DAY( sd + rn, 'SATURDAY' )
ELSE
( sd + rn )
END ) week_ending_date,
TO_CHAR( sd + rn, 'MM' ) calendar_month_number,
TO_CHAR( LAST_DAY( sd + rn ), 'DD' ) days_in_cal_month,
LAST_DAY( sd + rn ) end_of_cal_month,
TO_CHAR( sd + rn, 'FMMonth' ) calendar_month_name,
( ( CASE
WHEN TO_CHAR( sd + rn, 'Q' ) = 1 THEN
TO_DATE( '03/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 2 THEN
TO_DATE( '06/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 3 THEN
TO_DATE( '09/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 4 THEN
TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
END ) - TRUNC( sd + rn, 'Q' ) + 1 ) days_in_cal_quarter,
TRUNC( sd + rn, 'Q' ) beg_of_cal_quarter,
( CASE
WHEN TO_CHAR( sd + rn, 'Q' ) = 1 THEN
TO_DATE( '03/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 2 THEN
TO_DATE( '06/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 3 THEN
TO_DATE( '09/30/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
WHEN TO_CHAR( sd + rn, 'Q' ) = 4 THEN
TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
END ) end_of_cal_quarter,
TO_CHAR( sd + rn, 'Q' ) calendar_quarter_number,
TO_CHAR( sd + rn, 'YYYY' ) calendar_year,
( TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' )
- TRUNC( sd + rn, 'YEAR' ) ) days_in_cal_year,
TRUNC( sd + rn, 'YEAR' ) beg_of_cal_year,
TO_DATE( '12/31/' || TO_CHAR( sd + rn, 'YYYY' ), 'MM/DD/YYYY' ) end_of_cal_year
FROM
(
SELECT
TO_DATE( '12/31/2002', 'MM/DD/YYYY' ) sd,
rownum rn
FROM dual
CONNECT BY level <= 6575
)
/
You can find the table definition and the INSERT statement here as well.

Any feedback good, bad or ugly is welcome.

Labels: , ,

 
  Time
Many years ago (about 6 or 7 really), my boss sent me this diagram from the Oracle Docs. In particular, he wanted me to focus on the times table. After a few emails exchanged I finally figured out what he was asking, well sort of. He had used a table called periods described below:
CREATE TABLE periods
(
periodid NUMBER(10),
datefrom DATE,
datethru DATE,
periodtype VARCHAR2(30),--DAY, MONTH, YEAR, ETC
);
This was primarily (solely perhaps) used for reporting. He had actually built this pretty cool page for the business to be able to drill down on certain key aspects of the business. That may have been my first real exposure to Data Warehousing.

The point he was trying to make though was that this TIMES table was a flattened version of his PERIODS table. If you have one date, you can see how that date fell in multiple dimensions. What month it took place. What quarter it took place. What day of the year (1-365) it took place. All with just a single row of data.

The TIMES table, or dimension is a standard in data warehousing for exactly this reason.

I've been working on our "data warehouse" (replicated tables from the OLTP) and had been using the PERIODS methodology. There is a time dimension, but it's not populated at this time.

Since I am by no means a data warehousing expert and I have virtually no experience with OBIEE, I'm trying to come up with a way to help bridge both those gaps. Use OBIEE against the relational tables to help me learn OBIEE. I've created a couple of small subject areas so far and demo'd them to the users and they love it. So do I in fact.

Anyway, back to my topic. As I was thinking about TIME, I began to reminisce about time and how I got started. It's funny how I remember that exact moment sitting there with my feet propped up on my desk in my garage office.

Was there a point to all this? I don't think so. Just me rambling on.
I'm not sure exactly what triggered this ton

Labels: ,

 
  OBIEE: Direct Database Request
Of course a couple of our leads asked last week if they could get access to OBIEE to do some reports. Naturally, they didn't want the point and click variety, they had the SQL (our OLTP is replicated to our DW as well).

I had no idea how to do it of course. Click click clicking around I couldn't find anything. Wait, BI Administrator. That's how you add users. OK, good enough.

How do you give them access to write Direct Database Request SQL? I'm still not sure but the answer probably lies in the "Answer" section. From there, you can either select a subject area or select a "Direct Database Request." With the Direct Database Request all you need is the Connection Pool information (still not positive on how this is formatted) and your SQL statement.

The Connection Pool entry looks something like this:
"DEV_BI"."Connection Pool"

You do not need to put the relevant tables into the Physical Layer of the Administration tool. That makes life a little easier.

Labels: ,

 
  OBIEE: Copy Reports From Prod to Dev
Yes, I said from Prod to Dev.

As I recently reacquired my job via the DW guy resigning, I'm in charge of all his stuff. Which basically means I get to learn the easy way, fire.

This means that I haven't had any time to "learn." I've just had to do. Fortunately so far, I haven't been asked to do a whole lot with the tool. Until now.

We have a large migration in the pipe right now...massive changes to the core of our OLTP system which require massive retooling of our DW system. Naturally I've been working on the database side of things. I started out with some 150 invalid objects and I've whittled it down to 3.

What about the reports though? That SQL is not stored in the database (at least as I understand it). Last week I began looking for a tool in OBIEE that would allow me to see database dependencies, kind of like Application Express' tool. Alas I couldn't find one, but I didn't have a whole lot of time to search. And I'm still learning the OBIEE lingo so I don't really know what to search for.

Anyway, I chatted with my former colleague and he told me I could just update the catalog in the Dev environment. Talked to the DBA and the Linux admin about getting access but it never materialized (it was Friday I think and I was out sick today). I still don't know what all is involved other than there's some sort of file I can just swap out...which scares me. Chances are I'll write over something.

So, off to Mr. Rittman's site.

First, Migrating OBIEE Projects Between DEV and PROD Environments, which led me to this one, Migrating OBIEE Reports Between Web Catalogs.

I hadn't seen (or heard) about the BI Catalog Manager, so I opened it up. It want's a directory or URL. Hmmm...I have no idea. I'll have to get the admin guys to help me out with this one. Let me just try the login URL and see if that works:

http://bi_server_prod:7777/analytics/

Nope. But there's a hint, URL must be in the form of...

http://bi_server_prod:7777/analytics/saw.dll

Got it. Sweet. No outside intervention necessary. Log in to dev:

http://bi_server_dev:7777/analytics/saw.dll

Voila!

Copy and paste the reports over and I'm done. Now the arduous task of going through each report to make sure it's not broken. Should be fun.

Labels: , ,

 
  Learning Oracle Business Intelligence (OBIEE)
I learned how to edit a user in OBIEE without breaking anything (thus no need for a new counter Jake).

At first I thought it would be done through the web application only to find no edit button associated with the user. I had to go through the online repository, update the user, and commit those changes. Not really sure what all that means yet, but it's a start.

I had a crash tutorial on this last week as the person I replaced was off to start his own company. OBIEE is a vast, vast world.

Where does one go to learn OBIEE from the bottom? OTN naturally. The Oracle By Example series is excellent so that's where I started.

First up, Creating a Repository Using the Oracle Business Intelligence Administration Tool . I've made it through the Physical Mappings and the Business Mappings so far, one more section to go. I can't say I completely understand all the lingo, but I'll catch on sooner or later. I'm more concerned with the basics of support at this point.

Updated 2/28/2010
Blogs
- The RittmanMead blog is the premier online resource for all things OBI EE. It can be found here.
- John Minkjan over at OBIEE 101 has quite a bit of information. He posts often and detail posts. I've already been able to reference individual posts a number of times.
- gerardnico There are some great resources here.
- Christian Berg - who has helped me quite a bit writes over here.
- rnm1978 - same goes for this guy. Very helpful articles.
- OTN Forums

BI Architecture Series
This just started recently by Mark Rittman, I'm looking forward to future posts in this regard:

Inside the Oracle BI Server Part 1 : The BI Server Architecture
Inside the Oracle BI Server Part 2 : How Is A Query Processed?
Inside the Oracle BI Server Part 3 : BI Server In-Memory Joins

How ToInstallation
Miscellaneous
My articles on said subject can be found here.
I'm sure as time goes on my tastes will ebb and flow. I'll update as appropriate.

Labels: , , ,

 
  More on Oracle Exadata Storage
I don't know why I'm so excited about this...I guess I'm trying to live up to my name.

See my previous post here full of a few screen shots from Larry's speech.

A great post on most of the stuff I don't understand from Kevin Closson who works more on the datawarehouse side.

Paul Vallee over at the Pythian Group blog ruminates here and here.

Curt Monash talks about it here and here.

Dana Gardner from ZDNet.

Larry Dignan from CNET.com via ZDNet Australia.

Michael Liedtke via AP.

And how could I forget my local Oracle blogger, Mr. Cunningham.

I'm still amazed at how excited I am. It's just a machine right? Hopefully I'll get to use one of those someday...

Labels: , ,

 
  Oracle Exadata Storage
So the big announcement today...Larry created an appliance with HP as the hardware.

Press Release

Exadata

Here's a picture of the new beast:

In comparison to Larry:

Some of the characteristics:

And finally, it runs OLTP too!

Labels: , , ,

 
  Jobs: ETL Developer
This position is in Tampa, Florida. If you are interested you can email me at chet dot justice at gmail.

Position Description:
· Develop and implement applications that support the business strategy of measuring and optimizing performance
· Support existing data warehouse applications and users

Mandatory Skills/Experience
· 2+ years experience working on data warehouse/data integration solutions
· 2+ years working with vendor ETL tools to design/build/test (Business Objects Data Integrator XIR2)
· 1+ years working with Oracle database; good PL/SQL knowledge
· Experience with dimensional data models (star schemas)

Preferred Skills/Experience
· Java programming experience
· Data Quality XIR2
· SQL Server 2005/SSIS
· Business Objects Enterprise XIR2
· Crystal Reports XIR2

Labels: , , ,

 
  Pseudo Column ROWDATE?
I'm working a little bit in the datawarehouse again (by the way, is it two words or one?). A technical guy from Oracle showed up today to help us decide the best way to move forward on capturing changes. There were essentially 4 methods:

1. SQL and PL/SQL
2. Streams (CDC)
3. OWB/ODI
4. Logical Standby/Data Guard

As someone on the phone was talking, I started to wander...hmmm...what about some kind of pseudo column for that stored the last update (either INSERT or UPDATE) of a row?

Off the top of my head, I can think of rowid, rownum, and level. I'm sure there are others (feel free to comment).

Oracle, I'm sure, stores that information some where right?

I hadn't really given a thought as to the feasibility or the impact it might have, but that would make capturing changes a whole lot easier...

Labels: ,

 
  Multi Table INSERTs
I learned something new today! Multi Table INSERTs.

I was trying to capture the new records into a staging table using RETURNING BULK COLLECT INTO, but that doesn't work (yet).

I remembered reading about Multi Table INSERTs, but never had the need for using them, so never bothered to learn it.


CREATE TABLE t
(
x NUMBER PRIMARY KEY,
y NUMBER,
z NUMBER
);

CREATE TABLE s
(
x NUMBER PRIMARY KEY
);

INSERT ALL
INTO t ( x, y, z )
VALUES ( myrownum, ran1, ran2 )
INTO s ( x )
VALUES ( myrownum )
SELECT
rownum myrownum,
dbms_random.value ran1,
dbms_random.value ran2
FROM dual
CONNECT BY LEVEL < 101;

CJUSTICE@ORA10GR2>SELECT COUNT(*) FROM t;

COUNT(*)
----------
100

1 row selected.

Elapsed: 00:00:00.01
CJUSTICE@ORA10GR2>SELECT COUNT(*) FROM s;

COUNT(*)
----------
100

1 row selected.

Elapsed: 00:00:00.02


Awesome!

Labels: , ,

 
  DIMENSION Objects
One of the very first things I did when I got my current job was to go through the Datawarehousing guide in the Oracle documentation.

Analytics, check.
Materialized Views, check.
OLAP, interesting, but not applicable yet.
Dimensions...hmm, dimensions.

I looked at all the objects in the database but couldn't find any of the type DIMENSION.

I searched AskTom, nothing (or at least nothing I could find).

I can't remember finding any articles on the DIMENSION objects either. Can that be right? Does any one our there use them? Mr. Rittman? Mr. Aldridge? Mr. Scott?

From my brief perusal of the documentation, you can set up the DIMENSION like this:

CREATE TABLE test_dim
(
test_key NUMBER(10,0) PRIMARY KEY,
source VARCHAR2(10) NOT NULL,
source_subcategory VARCHAR2(10)
);

INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 1, 'MEDICAID', NULL );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 2, 'MEDICARE', NULL );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 3, 'SSA', 'MEDICARE' );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 4, 'SPAP', 'MEDICARE' );
INSERT INTO test_dim( test_key, source, source_subcategory )
VALUES ( 5, 'MAPD', 'MEDICARE' );

CREATE DIMENSION test_dim
LEVEL source IS ( test_dim.source )
LEVEL source_subcategory IS ( test_dim.source_subcategory );
My best guess is that it's sort of like the heirarchical table format (FK referencing PK from the same table).

Does anyone out there have any experience using them?

Labels: ,

 
  Datawarehouse: Testing
Until recently, I have been doing support/enhancements and new development for one particular project.

In our development environment, I have essentially been doing full load testing, using all of the records that are in production.

Needless to say, our development iterations were slow. Make a change to the code and then reload some 100 million records which are bounced against one another. The good thing is that I've picked up some really good tuning skills. The bad thing is my project was recently 2 months late.

Now that I've had a chance to breathe, I've been able to rethink some of the processes. First and foremost being how we test (both in development and QA).

If I had taken the time one year ago to build a "build" and "teardown" script, I probably could have shaved months of the project life cycle.

I used SQLUnit a few years ago and found that to be a pretty good tool. At the time, it consisted of writing out XML files and was quite time consuming, but gave me more confidence in the code I wrote. It also taught me to think like a tester forcing me to write better code as I could predict much easier where the problems might be.

We've used no such tool at WellCare.

I believe database testing to be extremely difficult. Our unit tests have consisted of running the procedures and then pasting that to a Word document...not really my idea of good testing, but it's what we have.

So are there any tools that you use to test database specific items? If you don't use tools, how do you do it? Why does database testing seem so difficult?

Labels: , ,

 
  Asynchronous Distributed HotLog - CDC Part IV
Part I, II and III.

For the time being I am throwing in the towel. I managed to patch my 10.2.0.1 version up to 10.2.0.3 thinking that might help, but it did not.

I'm still leaning to something on the 9.2.0.6 side, the only reason being is that in the trace file generated by the 10g instance, I see a "Alter9iSource" line in there...but there is nothing in the alert log on the 9i instance.

So I'm packing it up for a little while, hopefully the time away will do me some good and help me regain focus.

I did find a great example of the manual setup by Lewis Cunningham; that may be the route in the future, just a bit more work to maintain.

Labels: , , , , ,

 
  To LOOP or Not To LOOP - Revisited
In a previous post, I pondered the decision of LOOPing (i.e. PL/SQL calls to a lookup). Given 8 days to code and a 90 hour week last week I think I came up with a decent solution.

Initially I went the easy route, writing PL/SQL functions that would populate collections and then do the matching. I tried to tune the functions so that the collection would be populated only once for each line of business and then ordering the result set that would be passed through. In my mind, this would reduce the number of cursors opened. I wasn't if that would make a difference or not, but I was going to give it a try.

Well, it was dog slow. One of the lookups calls was estimated to take days to complete (I only know this because I have gotten into the habit of using dbms_application_info.set_session_longops).

So I rethought it.

I only needed to get the keys right?

I first created a primary key on the staging table (same key that would be used in the dimension). I then created a "work" table that would reference that key and then store the other key that I needed to lookup. I also made the foreign key a unique key (one to one relationship) so that I would know immediately if I did something wrong. I then used straight SQL to populate that table. One pass for all the values that matched and then another pass with those that didn't have a matching record (padded rows).

From days to minutes...2 minutes in fact. So I applied that technique to two other lookup situations.

The code is still pretty straight-forward and adding another line of business is relatively easy (I checked this today and it took all of 10 minutes). All the fancy PL/SQL and collection stuff I had created was thrown out the window.

So I did manage to use straight SQL, the performance is phenomenal and maintenance will be easy.

It even passed muster (albeit grudgingly) with our more seasoned datawarehouse folks. Ultimately, they couldn't complain when it ran in under two hours...

Labels: , , ,

 
  To CONSTRAINT or Not to CONSTRAINT
I've been having these long (good) drawn out conversations with a colleague of mine recently about constraints in a datawarehouse environment. Since I come from an OLTP environment, I want to put a constraint on everything. He on the other hand wants any constraints enforced through code.

I can understand that constraints my slow down table loads, but with the volume of data we are currently using (100 million row tables), I just don't see that as a huge impact on performance. For me, writing less code is better than any minimal performance gains we might achieve.

For instance, today he told me he wanted to add a new column. The possible values would either be M or G. I asked (I'm lead on the project) him to throw a CHECK constraint on the table limiting the values that can go in that field.

For me, I guess it's a support issue and part database purist issue. Use it for more than a bucket, Oracle's expensive. It forces developers to deal with that and will immediately tell them if they have done something wrong. It also makes support a tad easier as they won't have to wonder what actually goes in the field.

We talked at length about it but never really came to a decision (i.e. I couldn't convince him).

I know that a benchmark test would be the best way to prove either way, but I don't have the time right now to do it. Perhaps when things slow down again I will.

Any suggestions out there?

Labels: , ,

 
  To LOOP or Not to LOOP
I've already established my creds as a Tom Kyte follower.

Tom's mantra:

  • You should do it in a single SQL statement if at all possible.

  • If you cannot do it in a single SQL Statement, then do it in PL/SQL.

  • If you cannot do it in PL/SQL, try a Java Stored Procedure.

  • If you cannot do it in Java, do it in a C external procedure.

  • If you cannot do it in a C external routine, you might want to seriously think about why it is you need to do it…


I'm faced with a new datawarehouse load, eventually it will be about 100 million records. If I go the straight SQL way (embedded in PL/SQL), I'll need to create multiple objects to support that method and each time we add a new line of business, we'll have to add more objects and more code.

If I can encapsulte it in PL/SQL, I could eliminate most of the necessary supporting objects, but at a big cost, performance. It would make coding time much quicker and ultimately more maintainable (in my opinion), adding a new line of business would simple by updating the packaged functions with an additional control check.

It's a difficult pill to swallow though. I'm originally an OLTP guy and I can write pretty good SQL. Being so new to datawarehousing concepts I feel like I am cheating somehow. Making that decision of when to move to slow-by-slow seems exceedingly difficult. Hopefully with more experience in the datawarehouse environment it will become easier...but it's still tough.

Labels: ,

 


Guest Authors

How To

Popular

Previous Posts

Code Projects

Archives
August 2007 / September 2007 / October 2007 / November 2007 / December 2007 / January 2008 / February 2008 / March 2008 / April 2008 / May 2008 / June 2008 / July 2008 / August 2008 / September 2008 / October 2008 / November 2008 / December 2008 / January 2009 / February 2009 / March 2009 / April 2009 / May 2009 / June 2009 / July 2009 / August 2009 / September 2009 / October 2009 / November 2009 / December 2009 / January 2010 / February 2010 / March 2010 /


Aggregated by OraNA