ORACLENERD: dbms

Tuesday, November 18, 2008

A Java PL/SQL Port Scanner

Saturday night I started to wonder how I could get the movies of my DVR. Recently Verizon FIOS added a boatload of HD stations so I've naturally begun to record those. Problem is, the 50 hours or so I have available are quickly eaten up by the HD versions.

We have the multi-room DVR setup, which means we have one DVR but we can view those movies from any TV (with a cable box). Naturally all these boxes are on my internal network (router provided by Verizon).

I understand that with some boxes you can go out and buy additional storage, but I wanted a DIY solution. So, if I could find an open port on one of these boxes, then perhaps I could figure out a way to get the movies onto my computer.

Java to the rescue. Well, not really, but I have always wanted to learn more about that kind of lower level programming (sockets/ports/TCP/whatever). See, I don't really know what I'm talking about which is why I did it.

Since I use JDeveloper for writing SQL and PL/SQL, creating a java class (with the Main method) is easy.

import java.io.IOException;
import java.net.Socket;
import java.net.UnknownHostException;
import java.util.Date;

public class Class1
{
  private static void p ( String text )
  {
    System.out.println( text );
  }

  public static void main(String[] args) throws UnknownHostException, IOException
  {
    Date startDate = new Date();
    String machine = "192.168.1.102";
    Thread thread = new Thread();

    Socket socket;
    int j = 0;
    int x = 0;
    p( "start date/time:  " + startDate.toString() );
    for ( int i = 1000; i < 10000; i++ )
    {
      x++;
      p( "checking port:  " + machine + ":" + i );
      try
      {
        socket = new Socket( machine, i );
        p( "Port open:  " + machine + ":" + i );
        j++;
      }
      catch (UnknownHostException e) { }
      catch (IOException e) { }
      finally
      {
      }
    }
    p( "ports checked:  " + x );
    p( "ports open:  " + j );
    p( "start date/time:  " + startDate.toString() );
    p( "end date/time:    " + new Date().toString() );
  }
}

As LC would say, "That's not so bad."

Only it was slow, 1 port at a time. I did learn that trying to open up a port connection is s l o w. How can I speed this up?

I could make it threaded right? No. That would take too much time.

But hey, I know how to do it in PL/SQL (threaded that is). UTL_TCP should work nicely. I've never had to use it, so a great opportunity to familiarize myself with it (not that it's needed very often).

So I start Oracle XE and initially just try it out to see how it works:

DECLARE
  c UTL_TCP.CONNECTION;
BEGIN
  c := utl_tcp.open_connection
        ( remote_host => '192.168.1.2',
          remote_port => 1,
          tx_timeout => 1 );
  
  utl_tcp.close_connection( c );
EXCEPTION
  WHEN others THEN
    utl_tcp.close_connection( c );
END check_port;
/

Easy enough. Now let's start use DBMS_JOB to "thread" this.

DECLARE
  TYPE r_record IS TABLE OF OPEN_PORTS.IP%TYPE INDEX BY BINARY_INTEGER;
  l_table R_RECORD;
  job_count PLS_INTEGER;
  job_number PLS_INTEGER;
BEGIN
  l_table(1) := '192.168.1.102';
  
  FOR i IN 1..l_table.COUNT LOOP
    FOR j IN 1..10000 LOOP
      SELECT COUNT(*)
      INTO job_count
      FROM user_jobs;

      WHILE job_count < 20 LOOP
      
        dbms_job.submit
          ( job => job_number,
            what => 'BEGIN check_port( ''' || l_table(i) || ''',' || j || '); COMMIT; END;' );

        COMMIT;
      END LOOP;
    END LOOP;    
  END LOOP;
END;
/

So it's running, and running, and running...I opened up another session and did a count on user_jobs. 10000. 12000. 20000. Uh, I think there's something wrong here.

I finally kill the session and there are 36,446 jobs in the queue. All checking the same port.

Then I can't even issue a simple SELECT statement. Maximum number of sessions reached or some such rubbish. Just a reminder, it was Saturday night and I was just trying to see if I could do this quickly.

Since I couldn't connect, I tried bouncing the database (a loud gasp from the DBAs out there). Come on! It's a local XE instance...there's nothing important there.

Start it back up, connect as SYS and still can't issue any queries. I do notice as I hit the / sign and the Enter key a bunch of times that once in a while it will let me do a query. So I create a quick PL/SQL block to remove all the jobs from the queue.

BEGIN
  FOR i IN ( SELECT job FROM user_jobs ) LOOP
    dbms_job.remove( i.job );
  END LOOP;
END;
/

I go through the / + Enter key about 50 times until it finally runs. Success!

Or so I think. Still receiving the max sessions error. There are still 36,446 jobs in user_jobs. I can't remove them anymore as I get "job doesn't exist" when I try to dbms_job.remove it again. Hmmm...How about?

DELETE FROM dba_jobs;

36,446 rows deleted.

What?

How can that be? I didn't expect that to work at all...but it did. I have yet to look up the particulars of my actions (deleting from dba_jobs) but it did the job.

And yes, I'm still planning on writing this (correctly) in the near future...I want more movies!

Saturday, November 24, 2007

Parallel Processing using DBMS_JOB

I found this article through the OraNA feed by ProdLife which talked about running a report that was based on multiple queries. It reminded me of something I did awhile back.

We have this multi-step process which loads data into 2 tables that the business would use to reconcile our money in the door and our membership. Membership is on a month granularity (member month) and our money is transactional (they may have multiple transactions within a given month).

One table stores the transactions joined with our members. Not the correct grain that the business needs but useful for research. The other table summarizes the transactions to the month level and then is joined with our membership so that both are at the same granularity. Currently we're pulling across about 27 million records for members and the same for their transactions.

On the right is a basic diagram of the process.

The process initially took 8 hours to complete. Part of it was the fact that it runs
sequentially. However, not all parts of this process are dependent on one another. It isn't until the final 2 steps (Target Table 1 and Target Table 2, in yellow) that they need to run sequentially.

I wanted to speed this up and began thinking about the ways to do this (assuming as much tuning as possible had already completed).

1. I could use our scheduler or unix shell scripts.

2. Use a table based approach as ProdLife did.

3. Utilize PL/SQL and DBMS_JOB.

I chose number 3 initially and that's the focus of this post. I'll detail why I didn't use this method at the end.

The first thing I had to figure out was how to get PL/SQL to wait. Having read a few
posts on AskTom I remembered the SLEEP procedure. After a quick scan of the site, I found that it was part of the DBMS_LOCK package. I asked the DBAs to give me access so that I could being testing.

I figured that if I could wait long enough, it would be easy to "poll" the USER_JOBS
view to see when it had finished. I'm just going to show code snippets as the whole thing can get quite long.

I first determined that the error returned from Oracle for a job not there is -23241.
That will let me know when it is complete. Next, I declared variables for each job to run.


DECLARE
  no_job EXCEPTION;
  PRAGMA EXCEPTION_INIT( no_job, -23421 );
  l_exists NUMBER;
  l_dollars_job NUMBER;
  l_members_job NUMBER;

First thing I do in the body is create the jobs using DBMS_JOB.SUBMIT.


BEGIN
  dbms_job.submit
    ( job => l_dollars_job,
      what => 'BEGIN p_mypackage.get_dollars; COMMIT; END;',
      next_date => SYSDATE );
      
  dbms_job.submit
    ( job => l_members_job,
      what => 'BEGIN p_mypackage.get_members; COMMIT; END;',
      next_date => SYSDATE );

  COMMIT;

Make sure you issue the COMMIT statement after the jobs have been submitted.

Here's the fun part. I created a loop that would call DBMS_LOCK.SLEEP and wait for 60 seconds. After the wait has ended, I check to see whether that job remains in the USER_JOBS table. This allows the jobs to complete in 100 minutes.


  FOR i IN 1..100 LOOP
    dbms_lock.sleep( 60 );

    IF l_dollars_job IS NOT NULL THEN
      BEGIN
        SELECT 1
        INTO l_exists
        FROM user_jobs
        WHERE job = l_dollars_job;

        l_exists := NULL;
      EXCEPTION
        WHEN no_data_found THEN
          l_dollars_job := NULL;--job is finished
      END;
    END IF;
    
    IF l_members_job IS NOT NULL THEN
      BEGIN
        SELECT 1
        INTO l_exists
        FROM user_jobs
        WHERE job = l_members_job;
        
        l_exists := NULL;
      EXCEPTION
        WHEN no_data_found THEN
          l_members_job := NULL;--job is finished
      END;
    END IF;

The next step is to determine when to exit the loop. Hopefully, the jobs will finish in time and move on to the next, but if not, you want to exit gracefully. Well, semi-gracefully anyway.


    IF l_dollars_job IS NULL
      AND l_members_job IS NULL 
    THEN 
      EXIT;
    ELSIF i = 100 THEN
      BEGIN
        dbms_job.remove( l_dollars_job );
      EXCEPTION
        WHEN no_job THEN
          NULL;
      END;
      
      BEGIN
        dbms_job.remove( l_members_job );
      EXCEPTION
        WHEN no_job THEN
          NULL;
      END;
      --abort run, taking too long
      raise_application_error( -20001, 'DOLLARS/MEMBERS data from not loaded timely...' );
    END IF;
  END LOOP;
END;

That's all there is to it.

In the end though, I was convinced not to use this method as restartability would be difficult. Perhaps this method combined with the table-based approach would be the ideal. I'll leave that for another day though.