Monday, September 29, 2008

Database Design: Things to Consider

I started this one when I was unemployed last month.

By no means is the list exhaustive, just some of the things I think of when planning out my design.

What kind of data? If you’re going to be storing BLOBs of any sort, you may need a lot of space. Work with your DBA to figure out a plan.

How much data? Are you a telecom doing thousands of transactions per second? A non-profit doing 100 transactions a day? Somewhere in between?

Who are your customers? Will this be public facing? Internal only?

How will you handle security? Database Authentication? Roll your own table based authentication? If it’s a web based application, it’s probably easier to roll your own as it’s a stateless environment. If you’re doing a client/server application, Database Authentication is probably the best way to go as you won’t have to reinvent the wheel.

Can you use VPD or Application Contexts?

What type of keys? Will you stick to one method and use surrogate keys for everything? Natural keys for everything? Or a mix of both?

What version of Oracle?

Encryption?Will you need columnar encryption (SSN, Credit Card Numbers, etc)? Transparent Data Encryption?

Coming from the end-user world, I tend to start with reports. What kind of information do you need to retrieve from your data? I think that is the most important thing actually. For without good information, what's the point?

Anyway, feel free to add your own. What are some of things you consider whilst designing your database?

Friday, September 26, 2008

Job: Oracle DBA in Tampa

Here's another one. If interested, leave a comment (I won't publish) or email me, chet dot justice at the google mail domain (does this spelling out really work?).

DESCRIPTION:


The Database Analyst provides database administration support for production, development, and test environments to include day-to-day monitoring and maintenance, problem investigation and resolution, backup creation and maintenance, SQL and database optimization and tuning as well as researching, analyzing, and recommending patches, upgrades, and new technologies surrounding all aspects of data management using relational databases.

RESPONSIBILITIES:


* Provide support for all Oracle database environments to include problem identification, reporting, tracking, analysis, and resolution.

* Research and analyze all assigned anomalies - record, track and log all findings and recommendations.

* Participate in production on-call rotation (one week at a time of 7x24 on-call support for production environments).

* Monitor database performance and space utilization – investigate and resolve reported issues.

* Establish, maintain, and monitor database backups and DRP sites.

* Establish, maintain, and monitor database security, roles, user-role assignments, and individual logins per company security policies and procedures.

* Establish and maintain data replication as required for each application.

* Assist project teams in setting up and testing development, test, and production environments.

* Research, analyze, and recommend patches, tools, and/or database upgrades to resolve issues and add features/functionality.

* Architect, design, and implement replication and/or data synchronization.

* Assist project teams in designing efficient data access methods and procedures, including data structures (physical data modeling), SQL techniques, and use of database tools/utilities.

* Provide technical leadership to the DBA team by assisting junior team members and sharing knowledge and research of tools/techniques, and Oracle features.

* Serve as team liaison to project managers for database administration and system administration services.

REQUIREMENTS:


* 7 years Oracle database administration experience to include versions 9I and 10G.

* 7 years SQL experience with expert-level skills in the SQL language.

* 3 years near-real time replication experience - any tool (streams, golden gate, etc).

* 1 year RAC or ASM experience.

* RMAN/Legato backup experience.

* 4 year BS or BA degree required in a computer related field or 6 years of directly related work experience in a large-scale IT environment.

PREFERRED EXPERIENCE:


* PL/SQL development experience.

* Oracle advanced replication, streams, or Golden Gate replication.

* Oracle Application Server (not Oracle Applications 11i).

* Experience in Oracle RAC (10G).

* Experience in Oracle Advanced Security & Data Encryption.

* Transparent application failover.

Wednesday, September 24, 2008

More on Oracle Exadata Storage

I don't know why I'm so excited about this...I guess I'm trying to live up to my name.

See my previous post here full of a few screen shots from Larry's speech.

A great post on most of the stuff I don't understand from Kevin Closson who works more on the datawarehouse side.

Paul Vallee over at the Pythian Group blog ruminates here and here.

Curt Monash talks about it here and here.

Dana Gardner from ZDNet.

Larry Dignan from CNET.com via ZDNet Australia.

Michael Liedtke via AP.

And how could I forget my local Oracle blogger, Mr. Cunningham.

I'm still amazed at how excited I am. It's just a machine right? Hopefully I'll get to use one of those someday...

Oracle Exadata Storage

So the big announcement today...Larry created an appliance with HP as the hardware.

Press Release

Exadata

Here's a picture of the new beast:

In comparison to Larry:

Some of the characteristics:

And finally, it runs OLTP too!

Tuesday, September 23, 2008

Oracle in the Amazon Cloud

This is pretty old by now...I had every intention of blogging about it yesterday, but I didn't get to the computer last night.

So I'm "watching" Twitter, there's a whole lot of activity from Oracle Open World, and I see an update from oracletechnet (a.k.a. Justin Kestelyn, the OTN Editor-in-Chief) about the new Cloud Computing Center on OTN.

Of particular interest is the Flash demo of provisioning an EC2 instance with Oracle provided virtual machine images (Amazon Machine Images or AMIs).

Licensing information can be found here.  The FAQs can be found here.

The one problem that I haven't seen resolved yet with the Cloud setup (hosted offsite) is security.  As a financial services company, it would be difficult if not impossible.  I did ponder the ability to use the Cloud as a dev and QA environment, as long as everything was masked before hand and you used Transparent Data Encryption (TDE).  Also, as the FAQ states, it might be a good place to store your backups.

Sunday, September 14, 2008

Precious Memory?

My six year old son is brilliant. He's got my sense of humor with his mother's verbal wit.

This weekend, he says to us, "Mom, remember when I peed on you when I was a baby?" He starts laughing..."aaaah, precious memories."

What? "Precious" memories? Where the hell did he get that from?

Tuesday, September 9, 2008

Fun with SQL: Analytics and Heirarchical

I've had this problem since yesterday and I believe I finally solved it.

Given this data:
       ID    DIFF_ID START_DAT END_DATE      AMOUNT
--------- ---------- --------- --------- ----------
1 4 01-JAN-08 31-JAN-08 40
2 4 01-FEB-08 29-FEB-08 0
3 4 01-MAR-08 31-MAR-08 10
4 4 01-APR-08 30-APR-08 10
5 4 01-MAY-08 31-MAY-08 0
6 1 01-JAN-08 31-JAN-08 10
7 1 01-FEB-08 29-FEB-08 0
8 1 01-MAR-08 31-MAR-08 10
9 1 01-APR-08 30-APR-08 10
10 1 01-MAY-08 31-MAY-08 10
For each consecutive time period (month) that there is an amount, count how many buckets, up to six.

First thought was definitely Analytics. I toiled away on what became a very unwieldy query (took more than one page anyway). A whole bunch of LAGs with the same number of ever increasing CASE statements.

My first obstacle overcome was to filter out those that had a 0 for amount. That left me with:
        ID    DIFF_ID START_DAT END_DATE      AMOUNT
---------- ---------- --------- --------- ----------
1 4 01-JAN-08 31-JAN-08 40
3 4 01-MAR-08 31-MAR-08 10
4 4 01-APR-08 30-APR-08 10
6 1 01-JAN-08 31-JAN-08 10
8 1 01-MAR-08 31-MAR-08 10
9 1 01-APR-08 30-APR-08 10
10 1 01-MAY-08 31-MAY-08 10
It took a good while to figure that out for some reason.

Once I had that figured, I needed to figure out which were consecutive. Frank Zhou is always solving puzzles with SQL and I remembered I had responded to one of his about a year ago. If you get a chance, please take a look at his site...he solves some pretty cool puzzles with SQL using the MODEL clause and analytics.

Anyway, his post, How to find the earliest start date and the latest end date for consecutive transactions in SQL was similar (and my response similar), so I found it to revisit my thinking at the time.

First, I use the LAG function to get the previous row's ID (unique) and call it PREV_ID. I use DIFF_ID in the PARTITION clause (window) and order by END_DATE; then add one to see if the months are consecutive. If that value matches the START_DATE of the current row, it's consecutive and I use LAG again to get the previous row's ID.
SELECT
diff_id,
id,
start_date,
end_date,
( CASE
WHEN LAG( end_date ) OVER
( PARTITION BY diff_id
ORDER BY end_date ASC ) + 1 = start_date
THEN
LAG( id ) OVER
( PARTITION BY diff_id
ORDER BY end_date ASC )
ELSE NULL
END ) prev_id,
amount
FROM col_test
WHERE amount > 0
That produces the following output:
 DIFF_ID         ID START_DAT END_DATE     PREV_ID     AMOUNT
-------- ---------- --------- --------- ---------- ----------
1 6 01-JAN-08 31-JAN-08 10
1 8 01-MAR-08 31-MAR-08 10
1 9 01-APR-08 30-APR-08 8 10
1 10 01-MAY-08 31-MAY-08 9 10
4 1 01-JAN-08 31-JAN-08 40
4 3 01-MAR-08 31-MAR-08 10
4 4 01-APR-08 30-APR-08 3 10
As you can see, I have 3 records with the PREV_ID populated.

As I am building it, I realize I keep nesting the queries, so in comes the WITH clause (when I first learned of that it was terribly difficult to search for, I didn't know it was also called subquery factoring clause...).
WITH sub
AS
(
SELECT
diff_id,
id,
start_date,
end_date,
( CASE
WHEN LAG( end_date ) OVER
( PARTITION BY diff_id
ORDER BY end_date ASC ) + 1 = start_date
THEN
LAG( id ) OVER
( PARTITION BY diff_id
ORDER BY end_date ASC )
ELSE NULL
END ) prev_id,
amount
FROM col_test
WHERE amount > 0
)
SELECT
diff_id,
id,
start_date,
end_date,
TO_DATE( SUBSTR( SYS_CONNECT_BY_PATH(
TO_CHAR( start_date, 'MMDDYYYY' ), '-' ), 2, 8 ), 'MMDDYYYY' ) min_start_date
FROM sub
START WITH prev_id IS NULL
CONNECT BY PRIOR id = prev_id
Much better. Note the START WITH and CONNECT BY PRIOR, I created my own heirarchical table to determine another window to PARTITION on (MIN_START_DATE of the consecutive records).
DIFF_ID         ID START_DAT END_DATE  MIN_START
------- ---------- --------- --------- ---------
1 6 01-JAN-08 31-JAN-08 01-JAN-08
1 8 01-MAR-08 31-MAR-08 01-MAR-08
1 9 01-APR-08 30-APR-08 01-MAR-08
1 10 01-MAY-08 31-MAY-08 01-MAR-08
4 1 01-JAN-08 31-JAN-08 01-JAN-08
4 3 01-MAR-08 31-MAR-08 01-MAR-08
4 4 01-APR-08 30-APR-08 01-MAR-08
Now all I have to do is PIVOT the table (I chose not to use the new PIVOT feature) on DIFF_ID and add an analytic COUNT on my new window (MIN_START_DATE).
WITH sub
AS
(
SELECT
diff_id,
id,
start_date,
end_date,
( CASE
WHEN LAG( end_date ) OVER
( PARTITION BY diff_id
ORDER BY end_date ASC ) + 1 = start_date
THEN
LAG( id ) OVER
( PARTITION BY diff_id
ORDER BY end_date ASC )
ELSE NULL
END ) prev_id,
amount
FROM col_test
WHERE amount > 0
)
SELECT
diff_id,
COUNT( CASE WHEN d = 1 THEN 1 ELSE NULL END ) b1,
COUNT( CASE WHEN d = 2 THEN 1 ELSE NULL END ) b2,
COUNT( CASE WHEN d = 3 THEN 1 ELSE NULL END ) b3,
COUNT( CASE WHEN d = 4 THEN 1 ELSE NULL END ) b4,
COUNT( CASE WHEN d = 5 THEN 1 ELSE NULL END ) b5,
COUNT( CASE WHEN d = 6 THEN 1 ELSE NULL END ) b6
FROM
(
SELECT
diff_id,
COUNT( id ) OVER
( PARTITION BY diff_id, SUBSTR(
SYS_CONNECT_BY_PATH(
TO_CHAR( start_date, 'MMDDYYYY' ), '-' ), 2, 8 )
ORDER BY end_date ) d
FROM sub
START WITH prev_id IS NULL
CONNECT BY PRIOR id = prev_id
)
GROUP BY diff_id;
And voila!
DIFF_ID   B1   B2   B3   B4   B5   B6
------- ---- ---- ---- ---- ---- ----
1 2 1 1 0 0 0
4 2 1 0 0 0 0
Problem solved!

Table creation and data can be found here.

Friday, September 5, 2008

DBMS_SQL.TO_CURSOR_NUMBER

One of our programmers asked me the other day if it was possible to retrieve the meta-deta (columns, column types, etc) from a ref cursor. I told him that it's easy to do in Java (which I believe is one of his strong suits), but that's not what he wanted.

He wanted to do it in SQL or PL/SQL. Essentially, he wanted to be able to generate type-safe classes (I could look it up but it's Friday and I'm lazy). Also as a beginning to generate code so it can be moved out of the database into the web service layer. It hasn't been completely decided whether that will happen or not...once it is, I will go with the flow (or find a new job if I can't handle it right?).

So, how to do it in PL/SQL? I wasn't sure, but I immediately thought that if it could be done, it would be by using the DBMS_SQL package. So I opened the docs to see what was available to me.

Reading through, I found a procedure, TO_CURSOR_NUMBER. Basically it allows you to accept a ref cursor as an input parameter and convert it to a DBMS_SQL cursor handle. Then you can use all the goodness of DBMS_SQL as you normally would.

I created a short working example (which doesn't differ a whole lot from the example provided). Here's the function returning a ref cursor:

CREATE OR REPLACE
FUNCTION get_cursor RETURN SYS_REFCURSOR
IS
c SYS_REFCURSOR;
BEGIN
OPEN c FOR
SELECT
owner,
table_name,
tablespace_name,
status,
logging,
last_analyzed
FROM all_tables
WHERE rownum < 51;
RETURN c;
END get_cursor;
/
show errors
And here's the anonymous block I used to transform the ref cursor into a DBMS_SQL cursor:

DECLARE
c SYS_REFCURSOR;
l_cursorid NUMBER;
l_column_count INTEGER;
l_describe_table DBMS_SQL.DESC_TAB;
l_numvar NUMBER;
PROCEDURE p( i_text IN VARCHAR2 )
IS
BEGIN
dbms_output.put_line( i_text );
END p;
BEGIN
c := get_cursor;

l_cursorid := dbms_sql.to_cursor_number( c );
p( 'Cursor ID: ' || l_cursorid );

dbms_sql.describe_columns( l_cursorid, l_column_count, l_describe_table );
p( 'Column Count: ' || l_column_count );
p( 'DESC_TAB Count: ' || l_describe_table.COUNT );


FOR i IN 1..l_describe_table.COUNT LOOP
p( 'Column: ' || l_describe_table(i).col_name );
END LOOP;

dbms_sql.close_cursor( l_cursorid );
END;
/
I get tired of typing out DBMS_OUTPUT.PUT_LINE every 2 lines, so if I am going to use it more than once or twice, I use a procedure named "p" to do so.

Here's the output when I run it:

Cursor ID: 1438299795
Column Count: 6
DESC_TAB Count: 6
Column: OWNER
Column: TABLE_NAME
Column: TABLESPACE_NAME
Column: STATUS
Column: LOGGING
Column: LAST_ANALYZED
Fairly easy to use. It took about 20 minutes to read the docs and write it up. As you can see from the definition of the DBMS_SQL.DESC_TAB (below), you have access to quite a number of attributes of the cursor.

TYPE desc_rec IS RECORD (
col_type BINARY_INTEGER := 0,
col_max_len BINARY_INTEGER := 0,
col_name VARCHAR2(32) := '',
col_name_len BINARY_INTEGER := 0,
col_schema_name VARCHAR2(32) := '',
col_schema_name_len BINARY_INTEGER := 0,
col_precision BINARY_INTEGER := 0,
col_scale BINARY_INTEGER := 0,
col_charsetid BINARY_INTEGER := 0,
col_charsetform BINARY_INTEGER := 0,
col_null_ok BOOLEAN := TRUE );
I think I found (with the help of my DBA) the conversion of the col_type (which is an integer) to it's text equivalent, but I can't say for certain yet as I haven't tried it.

Thursday, September 4, 2008

Jobs: Data Architect/Software Developer

Does anyone get anything out of me posting jobs? For or against?

So I like helping people out? What's wrong with that? ;) I will receive no money from this...

This is a job I never interviewed for. I did a phone screen with the Recruiter (a Chase Paymentech recruiter), but I received an offer from Revolution Money later that afternoon and accepted.

She found me resume on Monster and when I read it (the font size was like 72!), I thought this was tailor made for me.

Either post a comment (I won't publish it) or email me at chet dot justice at gmail for contact information. Here are the requirements:

Data Architect/Software Developer will work as part of a team to refractor Chase Paymentech's current data structures and design new solutions.

This is a great opportunity for an experienced Oracle Developer who is looking to move into a Data Architect role.

Bachelor's degree with major in Computer Science with seven years of progressive development experience.

Ideal candidate will have experience working in an Oracle environment with OLAP and DSS databases along with strong PL/SQL coding skills.

This position also requires strong data modeling skills and knowledge of Informatica and database release process in order to be successful.

Candidate should be able to demonstrate experience in a mentoring or training role.

I will add that this position is new and so is the "department." It's kind of "start-uppy" within this giant organization. That's something that really appealed to me.

Monday, September 1, 2008

Design: The Entity = The Party

I've discussed the merits of design and the entity here and here.

In a recent interview, the prospective employer mentioned the "party" model. I had no idea what they were talking about at the time only that it was similar to my entity model.

In another interview, I was asked about subtyping. I didn't know the vernacular as it pertained to database modeling, but I went on to explain the entity model. He told me they were one and the same! Now I have a name for it and I came to something that others had already "invented." While it would have been easier to read one of the books on modeling that discussed the Party Model, but I can't seem to read technical books (online is a different story for some reason). I also think it's pretty cool I came to the same conclusion as others outside of their influence. I do have to wonder though if I took it in at some point of time but don't explicitly recall it.

Anyway, it's definitely nice to have an idea validated.

Below are some general links on data modeling and specific ones on the Party Model:
Data Modeling on Wikipedia
A Universal Person and Organization Data Model
Siebel/Oracle - Party Data Model
Party Information Framework