BULK COLLECT and FORALL
I have a task to tune a little bit of SQL. It's very ETL like, but the target is not (yet) a star schema, it's pretty much a table flattened out for reporting purposes.
As I reviewed it, I noticed it went row by row, with a COMMIT inside the LOOP. That's gotta go.
Can I do this in one SQL statement? No, there's other processing that needs to be done (UPDATE two other tables before and after). Hmmm...could I just return the appropriate records into a collection? I'll have to look at that to see if it's possible.
For now though, I am going to try and use BULK COLLECT with the LIMIT clause and FORALL for the processes that occur before and after.
Let's create some data:
CREATE TABLE t( x NUMBER, y NUMBER );
INSERT INTO t( x, y )
SELECT
TRUNC( dbms_random.value( 1, 99999999 ) ),
TRUNC( dbms_random.value( 1, 100000 ) )
FROM dual
CONNECT BY level < 1001;
OK, now let's create an anonymous block, BULK COLLECTing the data from T into a PL/SQL table and then populated another table with that data:
DECLARE
TYPE t_record IS TABLE OF T%ROWTYPE;
l_table T_RECORD;
CURSOR c
IS
SELECT x, y
FROM t;
BEGIN
OPEN c;
LOOP
FETCH c
BULK COLLECT INTO l_table
LIMIT 100;
FORALL i IN 1..l_table.COUNT
INSERT INTO s ( x, y )
VALUES ( l_table(i).x, l_table(i).y );
EXIT WHEN C%NOTFOUND;
END LOOP;
CLOSE c;
END;
/
And then I run it and I get the following:
ERROR at line 18:
ORA-06550: line 18, column 16:
PLS-00436: implementation restriction: cannot reference
fields of BULK In-BIND table of records
ORA-06550: line 18, column 16:
PLS-00382: expression is of wrong type
ORA-06550: line 18, column 30:
PLS-00436: implementation restriction: cannot reference
fields of BULK In-BIND table of records
ORA-06550: line 18, column 30:
PLS-00382: expression is of wrong type
ORA-06550: line 18, column 16:
PL/SQL: ORA-22806: not an object or REF
ORA-06550: line 17, column 7:
PL/SQL: SQL Statement ignored
A quick
google search and I end up
here .
So I can bulk bind, but I have to INSERT into the table as a whole. I can't be selective.
I updated my code to this:
DECLARE
TYPE t_record IS TABLE OF T%ROWTYPE;
l_table T_RECORD;
CURSOR c
IS
SELECT x, y
FROM t;
BEGIN
OPEN c;
LOOP
FETCH c
BULK COLLECT INTO l_table
LIMIT 100;
FORALL i IN 1..l_table.COUNT
INSERT INTO s
VALUES l_table(i);
EXIT WHEN C%NOTFOUND;
END LOOP;
CLOSE c;
END;
/
I run it and it completes successfully. This is all on XE, so I wonder (hope) that 11g will allow me to do what I want (I'll be working on an 11g RAC system). I scroll down the list of google results and I find this
one which then takes me to
AskTom. The first post demonstrates that my first attempt
will work on 11g.
BULK COLLECT and FORALL are great tools if you can't do it in a single SQL statement and if you want to avoid the row by row processing.
Labels: development, howto, oracle, plsql, sql
DBMS_CRYPTO: Example
In the comments from my
previous post, Tyler and Tom both mentioned the fact that you should never store sensitive data as plain-text in the database. I used a bad example unfortunately, but Tom did mention the DBMS_CRYPTO package which I have used before.
Since I have used it and published a
working example to the OTN forums, I figured I'd put it here as well.
The goal is the encrypt a SSN and store the encrypted string (RAW) in the database. In conjunction with VPD, you should be able to easily control who can see the plain-text version of the SSN. To do that you will need to create another function which decrypts the encrypted SSN.
This is a pretty basic example, but it should help to get you started. You can read more about the
DBMS_CRYPTO package in the docs.
CREATE OR REPLACE
PACKAGE BODY p_encrypt
AS
--DO NOT FORGET TO WRAP THIS BEFORE LOADING INTO DATABASE
--IF IT IS NOT WRAPPED, THE KEY WILL BE EXPOSED
--THE WRAP UTILITY IS LOCATED IN THE \BIN DIRECTORY (WRAP.EXE)
G_KEY RAW(32) := UTL_I18N.STRING_TO_RAW( 'some_random_string_stuff_goes_here', 'AL32UTF8' );
FUNCTION encrypt_ssn( p_ssn IN VARCHAR2 ) RETURN RAW
IS
l_ssn RAW(32) := UTL_I18N.STRING_TO_RAW( p_ssn, 'AL32UTF8' );
l_encrypted RAW(32);
BEGIN
NULL;
l_encrypted := dbms_crypto.encrypt
( src => l_ssn,
typ => DBMS_CRYPTO.DES_CBC_PKCS5,
key => G_KEY );
RETURN l_encrypted;
END encrypt_ssn;
FUNCTION decrypt_ssn( p_ssn IN RAW ) RETURN VARCHAR2
IS
l_decrypted RAW(32);
BEGIN
l_decrypted := dbms_crypto.decrypt
( src => p_ssn,
typ => DBMS_CRYPTO.DES_CBC_PKCS5,
key => G_KEY );
RETURN UTL_I18N.RAW_TO_CHAR( l_decrypted, 'AL32UTF8' );
END decrypt_ssn;
END p_encrypt;
/
show errors
I mention it in the comments of the code, but do not forget to wrap your PL/SQL before you load it, otherwise someone will be able to easily see your salt/key.
Labels: dbms_crypto, howto, oracle, plsql, sql
Asynchronous Distributed HotLog - CDC Part III
This is the third installment (
one and
two)of my attempts to configure Oracle's CDC from a 9.2.0.6 source (highest release for windows) to a 10.2.0.1 (ditto) target.
After being appropriately set up with privileges (DBA's trusted me with the DBA role!) on our actual environments, 9.2.0.7 source and 10.2.0.3 target (Sun Solaris), I've moved back to trying to just get a simple proof of concept working on my machine.
That's had it's own difficulties, though I can safely say that I can manually create 9i and 10g databases manually.
Getting the databases configured properly has been my biggest challenge. I had eliminated most possibilities down to either Net Services or something to do with the JVM. I used my local listener and ruled out Net Services. It was down to the JVM. I kept getting this strange error:
ERROR at line 1:
ORA-00600: internal error code, arguments: [qccsrc_createHLSource-3], [], [],
[], [], [], [], []
ORA-06512: at "SYS.DBMS_CDC_IPUBLISH", line 133
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_CDC_PUBLISH", line 226
ORA-06512: at line 2
A trace file was generated (with the above error) but it didn't really tell me anything. I googled the phrase, nothing. I went to Metalink, nothing. It looked like a Java call though, so I tore the database down and rebuilt everything. Just in case, I added XDB, Data Mining and InterMedia in case there was some reference there.
Installing InterMedia was a pain. I kept getting a failure of a particular jar (ORA-23542?), could not resolve the class.
I
googled one of the classes that failed to see if anyone had run across this before. There was only one record found...and it was
mine from about a year and half ago! I found that very amusing.
So I went into the
\ord\im directory and found a readme.txt. As I am scanning through it I see CLASSPATH. So I add the environment variable and voila! Perfect install. That would have solved my problem from before too. I followed the docs on the install to a T. Nothing in there about CLASSPATH. Perhaps I should write them and ask for it to be added?
Anyway, thinking the best, since I had a perfectly clean installation, I tried again:
SQL> BEGIN
2 dbms_cdc_publish.create_hotlog_change_source
3 ( change_source_name => 'T',
4 description => 'Test',
5 source_database => 'ORANINE' );
6 END;
7 /
And I waited for an agonizing minute thinking I had it this time...
BEGIN
*
ERROR at line 1:
ORA-01426: numeric overflow
ORA-06512: at "SYS.DBMS_CDC_IPUBLISH", line 133
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_CDC_PUBLISH", line 226
ORA-06512: at line 2
Barnacles!
At least the ORA-0600 is gone, but I am back to where I started.
Another good thing is that I am really learning how to pour through trace files and tkprof files.
All the examples I have found make it seem so easy...why can't I get this dang thing to work?!Labels: cdc, dbms_cdc_publish, development, howto, work
Asynchronous Distributed HotLog - CDC Part II
In my first
installment, I talked about my task to configure CDC.
I've set up both the source and target databases as detailed in the documents.
I create a simple table:
CREATE TABLE t ( x NUMBER, y DATE );
From the target database I issue the following:
BEGIN
dbms_cdc_publish.create_hotlog_change_source
( change_source_name => 'T_CHANGE',
description => 'Test',
source_database => 'CDC_APP' );
END;
/
and shortly thereafter receive the following error:
BEGIN
*
ERROR at line 1:
ORA-01426: numeric overflow
ORA-06512: at "SYS.DBMS_CDC_IPUBLISH", line 133
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_CDC_PUBLISH", line 226
ORA-06512: at line 2
Elapsed: 00:00:57.23
So I set up tracing and I find the following in the trace file:
PARSING IN CURSOR #13 len=52 dep=0 uid=12 oct=47 lid=12 tim=342736069442 hv=1029988163 ad='3414b300'
BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
END OF STMT
PARSE #13:c=0,e=54,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=342736069435
BINDS #13:
kkscoacd
Bind#0
oacdty=123 mxl=4000(4000) mxlc=00 mal=00 scl=00 pre=00
oacflg=00 fl2=1000000 frm=00 csi=00 siz=4000 off=0
kxsbbbfp=08460bd8 bln=4000 avl=00 flg=15
Bind#1
oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00
oacflg=01 fl2=1000000 frm=00 csi=00 siz=24 off=0
kxsbbbfp=0844cf58 bln=22 avl=22 flg=05
value=###
An invalid number has been seen.Memory contents are :
The way I read that is "###" is being passed to dbms_output.get_lines where it should be expecting a number.
Again, the source database is 9.2.0.7 and the target is 10.2.0.1. I uninstalled the JVM because I thought that the wrong version of java was used to load it (not that I can find any reference to it being used). Changed my path, set up an ORACLE_HOME environment variable (yes, windows XP).
I'm hoping I just mangled the configuration of my local database. Tomorrow one of the DBAs is going to walk me through this in either our DEV or TEST environments (as it should be).
Fun...
Labels: cdc, howto, oracle
SAS: Create SAS Dataset from Oracle
SAS is very prevalent in our environment. When testing, I need to create some datasets for the business users so they they can do their UAT (yes, I shouldn't be involved at this point...but I am).
Here's the syntax to do that (I use SAS EG):
LIBNAME dmart oracle path=<oracle_sid> schema=<schema_name> pw=<password> user=<db_username>;
CREATE TABLE sasave.mytable
AS
SELECT * FROM dmart.mytable;
QUIT;
You can also use the following syntax:
LIBNAME dmart oracle path=<oracle_sid> schema=<schema_name> pw=<password> user=<db_username>;
DATA sasave.mytable_filtered;
SET dmart.mytable
( WHERE = id IN ( 5, 6, 7, 8 ) );
RUN;
If you want to use Oracle SQL (SAS guys and gals call it passthrough SQL), here's the syntax for that:
PROC SQL NOPRINT;
CONNECT TO oracle( username=<username> pw=<password> path="<oracle_sid>");
CREATE TABLE sasave.dual_test
AS
SELECT *
FROM connection to oracle
( SELECT * FROM DUAL );
disconnect from oracle;
QUIT;
Labels: howto, sas, sql
Instrumentation: DEBUG/LOGGING
In my
previous entry on instrumenting code I detailed the way in which you could use the DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS procedure to help monitor your long running code.
In this one, I will detail my home grown version of
Tom Kyte's debug routine. I do know that others have similar code but can't seem to find them right now.
You can find the source code
here.
Contents:
2 tables
2 sequences
1 package
1 procedure
1 build file
debug_tab allows you to turn the debugging on and off. debug_details_tab will store each line that you write to the debug routine when turned on.
Here's an example of it in practice:
CREATE OR REPLACE
FUNCTION get_age_in_months( p_id IN NUMBER ) RETURN NUMBER
IS
l_age_in_months INTEGER;
BEGIN
--instrumentation calls
debug( 'GET_AGE_IN_MONTHS' );
debug( 'P_ID: ' || p_id );
debug( 'select value into variable' );
SELECT age_in_months
INTO l_age_in_months
FROM people_tab
WHERE id = p_id;
debug( 'L_AGE_IN_MONTHS: ' || l_age_in_months );
RETURN l_age_in_months;
EXCEPTION
WHEN no_data_found THEN
debug( 'no data found' );
RETURN NULL;
END get_age_in_months;
/
show errors
I mentioned in the previous article that I had had difficulty grasping this concept initially. I think once I related it to DBMS_OUTPUT.PUT_LINE it became much more clear to me.
This simple debug routine has helped me tremendously in the last year or two that I have used it. Especially when you get nested levels of logic. It gets very hard to keep track of where you are, but with this you can run your procedure or function and then issue a SELECT against the debug_details_tab and see what was happening when.
I even began using this in my APEX applications. I would preface each line with "APEX: " and then write whatever was necessary so that I could step through the various pieces of code. It became crucial when I was doing validation on a table of records in a collection...oh so much easier.
On large systems this
will generate a lot of data. I definitely would not put this inside a loop doing some sort of batch processing, but it is helpful to find where things fail out.
It can certainly be improved on. I basically took the idea of DBMS_OUTPUT.PUT_LINE and wrote it to a table, nothing fancy there. Mr. Kyte mentions writing it to the file system as well. Since I don't have ready access to the database file system, this was the easiest.
Make your life easier and use a debug/logging routine in your code. No longer will you have to comment, the debug statements should do it for you!
Labels: code, development, howto, instrumentation, plsql, tools, utilities, work
Parallel Processing using DBMS_JOB
I found
this article through the OraNA feed by ProdLife which talked about running a report that was based on multiple queries. It reminded me of something I did awhile back.
We have this multi-step process which loads data into 2 tables that the business would use to reconcile our money in the door and our membership. Membership is on a month granularity (member month) and our money is transactional (they may have multiple transactions within a given month).
One table stores the transactions joined with our members. Not the correct grain that the business needs but useful for research. The other table summarizes the transactions to the month level and then is joined with our membership so that both are at the same granularity. Currently we're pulling across about 27 million records for members and the same for their transactions.
On the right is a basic diagram of the process.
The process initially took 8 hours to complete. Part of it was the fact that it runs
sequentially. However, not all parts of this process are dependent on one another. It isn't until the final 2 steps (Target Table 1 and Target Table 2, in yellow) that they need to run sequentially.
I wanted to speed this up and began thinking about the ways to do this (assuming as much tuning as possible had already completed).
1. I could use our scheduler or unix shell scripts.
2. Use a table based approach as ProdLife did.
3. Utilize PL/SQL and
DBMS_JOB.
I chose number 3 initially and that's the focus of this post. I'll detail why I didn't use this method at the end.
The first thing I had to figure out was how to get PL/SQL to wait. Having read a few
posts on
AskTom I remembered the SLEEP procedure. After a quick scan of the site, I found that it was part of the
DBMS_LOCK package. I asked the DBAs to give me access so that I could being testing.
I figured that if I could wait long enough, it would be easy to "poll" the USER_JOBS
view to see when it had finished. I'm just going to show code snippets as the whole thing can get quite long.
I first determined that the error returned from Oracle for a job not there is -23241.
That will let me know when it is complete. Next, I declared variables for each job to run.
DECLARE
no_job EXCEPTION;
PRAGMA EXCEPTION_INIT( no_job, -23421 );
l_exists NUMBER;
l_dollars_job NUMBER;
l_members_job NUMBER;
First thing I do in the body is create the jobs using DBMS_JOB.SUBMIT.
BEGIN
dbms_job.submit
( job => l_dollars_job,
what => 'BEGIN p_mypackage.get_dollars; COMMIT; END;',
next_date => SYSDATE );
dbms_job.submit
( job => l_members_job,
what => 'BEGIN p_mypackage.get_members; COMMIT; END;',
next_date => SYSDATE );
COMMIT;
Make sure you issue the COMMIT statement after the jobs have been submitted.
Here's the fun part. I created a loop that would call DBMS_LOCK.SLEEP and wait for 60 seconds. After the wait has ended, I check to see whether that job remains in the USER_JOBS table. This allows the jobs to complete in 100 minutes.
FOR i IN 1..100 LOOP
dbms_lock.sleep( 60 );
IF l_dollars_job IS NOT NULL THEN
BEGIN
SELECT 1
INTO l_exists
FROM user_jobs
WHERE job = l_dollars_job;
l_exists := NULL;
EXCEPTION
WHEN no_data_found THEN
l_dollars_job := NULL;--job is finished
END;
END IF;
IF l_members_job IS NOT NULL THEN
BEGIN
SELECT 1
INTO l_exists
FROM user_jobs
WHERE job = l_members_job;
l_exists := NULL;
EXCEPTION
WHEN no_data_found THEN
l_members_job := NULL;--job is finished
END;
END IF;
The next step is to determine when to exit the loop. Hopefully, the jobs will finish in time and move on to the next, but if not, you want to exit gracefully. Well, semi-gracefully anyway.
IF l_dollars_job IS NULL
AND l_members_job IS NULL
THEN
EXIT;
ELSIF i = 100 THEN
BEGIN
dbms_job.remove( l_dollars_job );
EXCEPTION
WHEN no_job THEN
NULL;
END;
BEGIN
dbms_job.remove( l_members_job );
EXCEPTION
WHEN no_job THEN
NULL;
END;
--abort run, taking too long
raise_application_error( -20001, 'DOLLARS/MEMBERS data from not loaded timely...' );
END IF;
END LOOP;
END;
That's all there is to it.
In the end though, I was convinced not to use this method as restartability would be difficult. Perhaps this method combined with the table-based approach would be the ideal. I'll leave that for another day though.
Labels: dbms_job, development, howto, plsql
Swapping Partitions
Recently I learned a pretty cool trick, swapping partitions.I'm sure I read it originally on AskTom, but I never had a need for it. Now that I am in data warehousing though, moving millions and millions of records around takes time...a lot of time.On this recent project, I had a process which brought in the data from a different database. There were some minor transformations on that data including the use of an analytic function to de-dup the data (I would assign a 1 to the first record matching my criteria and that's what I would use for everything downstream).So I have essentially six tables. One which is the target for the source data, four which handle the transformed data and another one which holds all of that transformed data. Initially I would perform a INSERT INTO...SELECT FROM...but it would take upwards of three hours for this one particular set of data (~26 million records), and I was doing that twice (six hours).Sitting with a colleague trying to figure out a way to speed up the process, we came across an article on AskTom which mentioned swapping (or exchanging) partitions with a regular table. The final table (of the six) is partitioned by line of business, a perfect candidate.So I issue the statement found there:ALTER TABLE target_table EXCHANGE PARTITION gmd_lob WITH TABLE source_table;That didn't work because the table structures did not match. So I went about changing the table structure to match that of the target table.I issued the statement again:ALTER TABLE target_table EXCHANGE PARTITION gmd_lob WITH TABLE source_table;Success! I then went on to change the other three tables structure to match that of the target table. What had taken three to four hours now took less than five minutes!Labels: database, howto, oracle, partition