ORACLENERD: April 2008

Tuesday, April 29, 2008

Validating a Process

I mentioned sometime back that I would be posting the code. I can't post the entire set of code but I can post the relevant parts.

Problem

Inbound Remittance Advice files are loaded into our Operational Data Store (ODS). We have absolutely no control over this, it lies with another group. On occasion, those files are double, or even triple, loaded.

Goal

To provide the business (and ourselves) with a way to track when files came in and if the entire processed worked as expected (no lost dollars, no lost record counts).

What do we do?

1. Read the files when they first come in (on disk).
2. Query the appropriate tables at certain intervals to verify it matches the file amounts.
3. Load those results into a table and then fail the normal load process if we detect any incongruities.

Solution (proposed)

1. Create a directory object on the Oracle server.
2. Copy inbound files to that directory.
3. Read files from that directory (Java anyone?)
4. Load files into CLOBs (so that I don't have to spend half my day finding the damn things, simple APEX app and I'm good to go).
5. Parse files to find relevant information (Java)
6. Query tables at various stages, blah blah blah.

Solution

Since the UTL_FILE doesn't have a function to read the contents of a directory, Java comes into play. I've done it before and I found the code originally on asktom (where else?).

For those of you too lazy (like me) to click the link, here's the important stuff:

snip...
      File file = new File( path );

      list = file.list();

      for ( int i = 0; i < list.length; i++ )
      {
        File indvidualFile = new File( path + list[i] );

        if ( indvidualFile.isFile() )
        {
          element = list[i];
          statement.setString( 1, element );
        }
      }
snip...

Since I'm using Java, I might as well use the StringTokenizer, makes like so much easier. But wait, since I'm reading it as a CLOB (and not a String), what do I do? I tried clob.toString(). Nope, it's just a pointer to the actual CLOB.

I have to use Reader and CharacterStream, getting well beyond my knowledge of Java.

With the help of a fellow Java developer, I was pointed towards StreamTokenizer which works in a similar fashion to StringTokenizer, or so I thought. Apparently StreamTokenizer is one of the lower level classes...so I had to figure out the ASCII values of the character I wanted to split on (a tilde: ~). I think my Java friend was surprised I figured this out...


      Reader characterStream = clob.getCharacterStream();
      StreamTokenizer stream = new StreamTokenizer( characterStream );
      stream.resetSyntax();
      stream.wordChars( 32, 125 );
      stream.parseNumbers();

Fun.

After I got that as a String, I could then use the StringTokenizer, which I knew.

Could I have split the CLOB using PL/SQL? Yes. Did I want to? Not really. I was already using Java so why not just use the Tokenizer?

The five classes were jarred and loaded into Oracle via the loadjava command. Wrap it all up in PL/SQL and life is easy!


  PROCEDURE get_directory_contents( i_directory IN VARCHAR2 )
  AS 
  LANGUAGE JAVA NAME 'com.hmocompany.dw.LoadFileNames.getFileNames( java.lang.String )';

Thursday, April 24, 2008

ApEx Presenting = FUN!

Wow...that's all I can say.

At lunch today, I put together my power point presentation. I meant to do it sooner of course, but it's been another busy week. In fact, I hadn't done a thing to prepare other than walking through it in my head.

I did not want this to be a presentation so much as a demo. Six slides is all they got.

I suddenly got very nervous around 4 today. I just want to get over there and start. I ended up leaving work just before 5.

The meeting was at the PriceWaterhouseCoopers building in Tampa, just across the street from the Tampa Bay Buccaneers headquarters. Very nice building.

The Presentation

I was introduced by the SOUG president promptly at 6:30. Roughly 40 people showed up (filled the room).

I had an hour and a half to complete my presentation...ummm...WHAT? I've got like 10 minutes worth of material! How the hell am I going to manage this??

Anyway, I walked through some of the features of APEX: Load/Unload data, SQL Commands, that kind of stuff, just trying to get to the application builder.

A couple of nights ago I began walking through and building a basic little reporting application tailored to the DBAs. Reports on roles and privileges basically. I never even got to that.

I created a 2 line csv (yes, I should have done it before hand), uploaded it to demonstrate how easy it was and then off the creating a report on that table.

"Can you create a form to update that record?"

Sure, here's how you do it. One minute later the form was done and I had updated the record. True to form...it's just so damn easy.

I answered a few other questions and then I got stuck. A member of the audience started answering the questions that I couldn't. We (the member in the audience) had spoken before the meeting and I believe he's just as passionate about APEX as I am). It worked rather well, so well in fact, that we're going to try and work out a dual presentation at the upcoming Technology Day SOUG puts on.

Someone would ask a question, I would answer them by showing them how to do it. Have I mentioned how much I like APEX? Rocks.

Next thing I know, it's a little past 8 and I can see the President trying to end the meeting. No way dude, too many questions to answer!

I got out of there around 9:30. I answered questions to the best of my ability...if I didn't know, pointed them to resources that could.

Exhilarating!

Can't wait to do it again...but next time I'll practice!

update
BTW, great crowd tonight. I couldn't have asked for a better group. Thanks to everyone for their support, and thanks to Tom and LewisC for asking me to do it.

Did I mention how cool it was? ;)

Monday, April 21, 2008

ApEx Presentation

I'll be doing my first professional IT presentation this Thursday for the Suncoast Oracle Users Group.

I'm a bit nervous, but excited at the same time. My goal is to make it as interactive as possible and just let it take me where it takes me.

I will have a canned demo, but I would rather it be more fluid. Can't always plan for those things...I could be a total bust!

So, if you're in the neighborhood and you want to check it out (or just laugh and heckle me), come on by. Details can be found here.

It's a Matter of Time

I've been thinking a lot lately about my recent failed deployments.

How did I get so sloppy?

I'm not one to make excuses, but I would say there are some mitigating circumstances at least. Time being one of them.

So I got out my trusty calculator (SQL*Plus), and ran the numbers.

From August 26, 2007 through March 30, 2008, I've worked 1802 hours. Of that, 118 were PTO or holiday, which brings the total down to 1784. For perspective, a work year of 8 hours per day comes out to 2080 hours a year.


VAR HOURS NUMBER;
EXEC :hours := 1784

  1  SELECT
  2    ROUND( ( :HOURS / 2080 ) * 100, 1 ) per_of_tot_year_hours
  3* FROM DUAL
ETL_WRK@ORA10GR2>/

PER_OF_TOT_YEAR_HOURS
---------------------
                 85.8

Cool! Only 86% of my hours in a little over 6 months!

Obviously, this is not a good thing.

Further breaking the numbers down:


VAR C VARCHAR2(10);
EXEC :C := '26-AUG-07';

SELECT
  start_day,
  end_day,
  days_between db,
  SUM( business_days ) bd,
  ROUND( ( :hours / days_between ), 1 ) hpd,
  ROUND( ( :hours / ( days_between / 7 ) ), 1 ) hpdw,
  ROUND( ( :hours / SUM( business_days ) ), 1 ) hpwd,
  ROUND( ( :hours / SUM( business_days / 5 ) ), 1 ) hpww
FROM
  (
    SELECT
      start_day,
      end_day, 
      TRUNC( end_day - start_day ) days_between,
      start_day + rownum dayof,
      ( CASE
          WHEN TO_CHAR( start_day + rownum, 'D' ) IN ( 2, 3, 4, 5, 6 ) THEN
            1
        END ) business_days
    FROM 
      dual a,
      (
        SELECT
          TO_DATE( :c, 'DD-MON-YY' ) start_day,
          TO_DATE( '30-MAR-08', 'DD-MON-YY' ) end_day
        FROM dual
      ) b
    CONNECT BY LEVEL <= TRUNC( end_day - start_day )
  )
GROUP BY 
  start_day,
  end_day,
  days_between
/

START_DAY  END_DAY        DB     BD    HPD   HPDW   HPWD   HPWW
---------- ---------- ------ ------ ------ ------ ------ ------
08/26/2007 03/30/2008    217    155    8.2   57.5   11.5   57.5

DB = Days Between
BD = Business Days
HPD = Hours Per Day
HPDW = Hours Per Day/Week
HPWD = Hours Per Work Day
HPWW = Hours Per Work Week

The scary part is that I am not very diligent about entering my time. It's probably short anywhere between 5 and 10%.

I'm not the only one working these kinds of hours either. I know for a fact there are others.

Do you think this plays a role in my failed deployments?

Friday, April 18, 2008

Datawarehouse: Testing

Until recently, I have been doing support/enhancements and new development for one particular project.

In our development environment, I have essentially been doing full load testing, using all of the records that are in production.

Needless to say, our development iterations were slow. Make a change to the code and then reload some 100 million records which are bounced against one another. The good thing is that I've picked up some really good tuning skills. The bad thing is my project was recently 2 months late.

Now that I've had a chance to breathe, I've been able to rethink some of the processes. First and foremost being how we test (both in development and QA).

If I had taken the time one year ago to build a "build" and "teardown" script, I probably could have shaved months of the project life cycle.

I used SQLUnit a few years ago and found that to be a pretty good tool. At the time, it consisted of writing out XML files and was quite time consuming, but gave me more confidence in the code I wrote. It also taught me to think like a tester forcing me to write better code as I could predict much easier where the problems might be.

We've used no such tool at WellCare.

I believe database testing to be extremely difficult. Our unit tests have consisted of running the procedures and then pasting that to a Word document...not really my idea of good testing, but it's what we have.

So are there any tools that you use to test database specific items? If you don't use tools, how do you do it? Why does database testing seem so difficult?

Thursday, April 17, 2008

Failed Deployments: An Index

Since there seem to be so many now, I'm creating this index page to track them. Enjoy!

The Countdown Timer - a brief not on the origin of the countdown timer adorning my site.

Good Day to Worse Day - This is the day that the countdown timer was first started.

DELETEng an entire production table.

Blowing up the Rate Application.

Another Failed Deployment...

Just before I went on vacation I (we) deployed an application so that our finance group could maintain their own rates.

We were doing between 5 and 8 critical change requests a month, unacceptable. I fought months ago to bring the rates into the data warehouse so that we could own them and eventually build this application (in APEX of course).

So what happened?

From my perspective:
1. I was rushing so that I could go on vacation without worry.
2. I was doing support work while trying to build the application.
3. I added create_date into the composite unique key without at the very least truncating SYSDATE. This created duplicate records. This was the impetus behind my post the other day named "How do you audit?"

We performed the official root cause analysis today and here's what came out of it:
1. There is no defined or official design process.
2. No official or formal design review was performed.
3. Chet sucks!

It was a good exercise. I believe I am always open and upfront about mistakes. I welcome criticism.

Length between failed deployments: 5 days

I guess it's better to get them out of the way in that short time span. It's never fun and it's always embarrassing. Ultimately, I just have to accept it and move on.

DBMS_CRYPTO: Example

Updated 03/08/2010
Original source can be found here.

Based on comments, specifically from caringaboutsecurity down below, I've (finally) updated the source code.

Another point, he is correct that there is no key used in this code. Salt is used, but no key. A key would need to be passed in along with the SSN and stored in some location. This code is not concerned with the overall security aspects, just to encrypt the value. I am not qualified to speak towards the over-arching domain of security.

The package specification is also provided in this updated version.

In the comments from my previous post, Tyler and Tom both mentioned the fact that you should never store sensitive data as plain-text in the database. I used a bad example unfortunately, but Tom did mention the DBMS_CRYPTO package which I have used before.

Since I have used it and published a working example to the OTN forums, I figured I'd put it here as well.

The goal is the encrypt a SSN and store the encrypted string (RAW) in the database. In conjunction with VPD, you should be able to easily control who can see the plain-text version of the SSN. To do that you will need to create another function which decrypts the encrypted SSN.

This is a pretty basic example, but it should help to get you started. You can read more about the DBMS_CRYPTO package in the docs.

Specification

CREATE OR REPLACE
PACKAGE p_encrypt
AS
  FUNCTION encrypt_ssn( p_ssn IN VARCHAR2 ) RETURN RAW;
  FUNCTION decrypt_ssn( p_ssn IN RAW ) RETURN VARCHAR2;
END p_encrypt;
/
show errors

Body

CREATE OR REPLACE
PACKAGE BODY p_encrypt
AS
--DO NOT FORGET TO WRAP THIS BEFORE LOADING INTO DATABASE
--IF IT IS NOT WRAPPED, THE KEY WILL BE EXPOSED
--THE WRAP UTILITY IS LOCATED IN THE \BIN DIRECTORY (WRAP.EXE)
  G_CHARACTER_SET VARCHAR2(10) := 'AL32UTF8';
  G_STRING VARCHAR2(32) := '12345678901234567890123456789012';
  G_KEY RAW(250) := utl_i18n.string_to_raw
                      ( data => G_STRING,
                        dst_charset => G_CHARACTER_SET );
  G_ENCRYPTION_TYPE PLS_INTEGER := dbms_crypto.encrypt_aes256 
                                    + dbms_crypto.chain_cbc 
                                    + dbms_crypto.pad_pkcs5;
  
  FUNCTION encrypt_ssn( p_ssn IN VARCHAR2 ) RETURN RAW
  IS
    l_ssn RAW(32) := UTL_I18N.STRING_TO_RAW( p_ssn, G_CHARACTER_SET );
    l_encrypted RAW(32);
  BEGIN
    l_ssn := utl_i18n.string_to_raw
              ( data => p_ssn,
                dst_charset => G_CHARACTER_SET );

    l_encrypted := dbms_crypto.encrypt
                   ( src => l_ssn,
                     typ => G_ENCRYPTION_TYPE,
                     key => G_KEY );
                     
    RETURN l_encrypted;
  END encrypt_ssn;
  
  FUNCTION decrypt_ssn( p_ssn IN RAW ) RETURN VARCHAR2
  IS
    l_decrypted RAW(32);
    l_decrypted_string VARCHAR2(32);
  BEGIN
    l_decrypted := dbms_crypto.decrypt
                    ( src => p_ssn,
                      typ => G_ENCRYPTION_TYPE,
                      key => G_KEY );

    l_decrypted_string := utl_i18n.raw_to_char
                            ( data => l_decrypted,
                              src_charset => G_CHARACTER_SET );
    RETURN l_decrypted_string;
  END decrypt_ssn;
  
END p_encrypt;
/ 
show errors

I mention it in the comments of the code, but do not forget to wrap your PL/SQL before you load it, otherwise someone will be able to easily see your salt/key.

Tuesday, April 15, 2008

How Do You Audit?

I'm not necessarily talking about system auditing, which I understand to be pretty much like throwing a switch, I'm talking about table level auditing.

Given the following table:


CREATE TABLE t
(
  id NUMBER(15) PRIMARY KEY,
  ssn VARCHAR2(9) UNIQUE,
  last_name VARCHAR2(40),
  first_name VARCHAR2(30),
  dob DATE    
);

The requirement is to track any changes to everything but the SSN. ID is just a surrogate key.

I typically do something like this:


CREATE TABLE t
(
  id NUMBER(15) PRIMARY KEY,
  ssn VARCHAR2(9) UNIQUE,
  last_name VARCHAR2(40),
  first_name VARCHAR2(30),
  dob DATE,
  create_date DATE,
  create_user VARCHAR2(30),
  end_date DATE,
  end_user VARCHAR2(30)    
);

I added the create_ and end_ columns to see who did what when. If someone comes in and decides to change the record, the end_date and end_user are populated (thereby "terminating" the record) and a new record is created with the updated values. That then becomes the "current" record.

I know there are other ways, but I'd like to hear some of your ideas/methods.

Updated
I realized that I should not have put ID as the primary key, I should have used a unique key (SSN) instead but generated a surrogate key (ID). I've updated it to use SSN as the unique key, so the primary key can and will change (it's a surrogate), but the SSN cannot.

Updated II
Then it would fail when you create a new record because the SSN would no longer be unique. Oy, I usually read and re-read anything technically related.

So I'll remove the UNIQUE constraint from SSN and that should do the trick. SSN will be used to find the record of the appropriate person, along with END_DATE IS NOT NULL (a "current" row).

Sunday, April 13, 2008

WellCare Data Breach...

Great...

Georgia WellCare Data Exposed

Insurance records of 71000 Ga. families made public

Georgia: 71,000 private records publicized

Georgia Patients’ Records Exposed on Web for Weeks

WellCare Flubs Data Privacy for 10,000 Georgians

I haven't spoken with anyone at work about this but I was aware of the event before going on vacation. I did not know any of the particulars, only that it involved Georgia in some way.

I believe everyone works hard to safeguard patient data. The short term drawback is that it will make everyone's life that much more difficult, the long term good will be a better process (I hope).