Saturday, November 24, 2007

Parallel Processing using DBMS_JOB

I found this article through the OraNA feed by ProdLife which talked about running a report that was based on multiple queries. It reminded me of something I did awhile back.

We have this multi-step process which loads data into 2 tables that the business would use to reconcile our money in the door and our membership. Membership is on a month granularity (member month) and our money is transactional (they may have multiple transactions within a given month).


One table stores the transactions joined with our members. Not the correct grain that the business needs but useful for research. The other table summarizes the transactions to the month level and then is joined with our membership so that both are at the same granularity. Currently we're pulling across about 27 million records for members and the same for their transactions.

On the right is a basic diagram of the process.

The process initially took 8 hours to complete. Part of it was the fact that it runs
sequentially. However, not all parts of this process are dependent on one another. It isn't until the final 2 steps (Target Table 1 and Target Table 2, in yellow) that they need to run sequentially.


I wanted to speed this up and began thinking about the ways to do this (assuming as much tuning as possible had already completed).

1. I could use our scheduler or unix shell scripts.

2. Use a table based approach as ProdLife did.

3. Utilize PL/SQL and DBMS_JOB.

I chose number 3 initially and that's the focus of this post. I'll detail why I didn't use this method at the end.

The first thing I had to figure out was how to get PL/SQL to wait. Having read a few
posts on AskTom I remembered the SLEEP procedure. After a quick scan of the site, I found that it was part of the DBMS_LOCK package. I asked the DBAs to give me access so that I could being testing.

I figured that if I could wait long enough, it would be easy to "poll" the USER_JOBS
view to see when it had finished. I'm just going to show code snippets as the whole thing can get quite long.

I first determined that the error returned from Oracle for a job not there is -23241.
That will let me know when it is complete. Next, I declared variables for each job to run.

DECLARE
no_job EXCEPTION;
PRAGMA EXCEPTION_INIT( no_job, -23421 );
l_exists NUMBER;
l_dollars_job NUMBER;
l_members_job NUMBER;

First thing I do in the body is create the jobs using DBMS_JOB.SUBMIT.

BEGIN
dbms_job.submit
( job => l_dollars_job,
what => 'BEGIN p_mypackage.get_dollars; COMMIT; END;',
next_date => SYSDATE );

dbms_job.submit
( job => l_members_job,
what => 'BEGIN p_mypackage.get_members; COMMIT; END;',
next_date => SYSDATE );

COMMIT;

Make sure you issue the COMMIT statement after the jobs have been submitted.

Here's the fun part. I created a loop that would call DBMS_LOCK.SLEEP and wait for 60 seconds. After the wait has ended, I check to see whether that job remains in the USER_JOBS table. This allows the jobs to complete in 100 minutes.

FOR i IN 1..100 LOOP
dbms_lock.sleep( 60 );

IF l_dollars_job IS NOT NULL THEN
BEGIN
SELECT 1
INTO l_exists
FROM user_jobs
WHERE job = l_dollars_job;

l_exists := NULL;
EXCEPTION
WHEN no_data_found THEN
l_dollars_job := NULL;--job is finished
END;
END IF;

IF l_members_job IS NOT NULL THEN
BEGIN
SELECT 1
INTO l_exists
FROM user_jobs
WHERE job = l_members_job;

l_exists := NULL;
EXCEPTION
WHEN no_data_found THEN
l_members_job := NULL;--job is finished
END;
END IF;

The next step is to determine when to exit the loop. Hopefully, the jobs will finish in time and move on to the next, but if not, you want to exit gracefully. Well, semi-gracefully anyway.

IF l_dollars_job IS NULL
AND l_members_job IS NULL
THEN
EXIT;
ELSIF i = 100 THEN
BEGIN
dbms_job.remove( l_dollars_job );
EXCEPTION
WHEN no_job THEN
NULL;
END;

BEGIN
dbms_job.remove( l_members_job );
EXCEPTION
WHEN no_job THEN
NULL;
END;
--abort run, taking too long
raise_application_error( -20001, 'DOLLARS/MEMBERS data from not loaded timely...' );
END IF;
END LOOP;
END;

That's all there is to it.

In the end though, I was convinced not to use this method as restartability would be difficult. Perhaps this method combined with the table-based approach would be the ideal. I'll leave that for another day though.

No comments: