Tuesday, June 16, 2009

PL/SQL: Split URL Parameters

I've always had this phobia of anything to do with strings. Might be because I find it hard to think that way...and by "that way" I have no idea what I'm talking about.

Whenever I come across this problem, my first instinct is to go through every character and build a string, something like this:
DECLARE
l_string VARCHAR2(100) := 'THIS_LITTLE_STRING';
l_character VARCHAR2(1);
BEGIN
FOR i IN 1..LENGTH( l_string ) LOOP
l_character := l_character || SUBSTR( l_string, i, 1 );
END LOOP;
END;
/
Of course that means I have to go through the loop at least 18 times. How could I loop less?

Well, I would need to know the delimiter first, in this case, it will be the underscore character:
DECLARE
l_string VARCHAR2(100) := 'THIS_LITTLE_STRING';
l_character VARCHAR2(1);
l_delimiter VARCHAR2(1) := '_';
BEGIN
...
OK, that helps. So what can I do with that?
DECLARE
l_string VARCHAR2(100) := 'THIS_LITTLE_STRING';
l_new_string VARCHAR2(100) := l_string;
l_token VARCHAR2(30);
l_character VARCHAR2(1);
l_delimiter VARCHAR2(1) := '_';
BEGIN
LOOP
EXIT WHEN l_new_string IS NULL;
l_token := SUBSTR( l_new_string, 1, INSTR( l_new_string, '_', 1 ) );
...
OK, now I remember what I always get confused. There seems to be this endless stream of SUBSTR and INSTR involved. It's difficult to follow...for me anyway. So here's what I've come up with for now. I am aware of regular expressions and the DBMS_UTILITY.COMMA_TO_TABLE procedures, but for some reason, I like to reinvent the wheel. No, not really, I just find it challenging. More challenging would be for me to start using regular expressions...Anyway, here goes my attempt at splitting a URL string:
DECLARE
TYPE my_table IS TABLE OF VARCHAR2(100);
l_table MY_TABLE := MY_TABLE();
l_string VARCHAR2(1000) := 'TESTING=YES&&BOLLOCKS=SOMETHING&&&testing=DF';
l_keyvalue VARCHAR2(100);
l_delimiter VARCHAR2(5) := '&&';
l_delimiter_length INTEGER := LENGTH( l_delimiter );
l_counter INTEGER := 0;
l_new VARCHAR2(4000);
l_start INTEGER := 1;
l_end INTEGER := 0;
BEGIN
IF SUBSTR( l_string, -1, 1 ) <> l_delimiter THEN
l_string := l_string || l_delimiter;
END IF;

l_new := l_string;

LOOP
l_counter := l_counter + 1;
l_end := INSTR( l_new, l_delimiter, 1 );
l_keyvalue := SUBSTR( l_new, 1, l_end - 1 );
EXIT WHEN l_keyvalue IS NULL;
l_table.EXTEND(1);
l_table(l_counter) := l_keyvalue;
l_start := l_start + ( l_end + ( l_delimiter_length - 1 ) );
l_new := SUBSTR( l_string, l_start );
END LOOP;
p( 'table counter: ' || l_table.COUNT );
FOR I IN 1..l_table.COUNT LOOP
p( 'string: ' || l_table(i) );
END LOOP;
END;
/

CJUSTICE@TESTING>@T
table counter: 3
string: TESTING=YES
string: BOLLOCKS=SOMETHING
string: &testing=DF

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03
Voila! My own URL String Tokenizer.

Now I just have to write something that will split up the key/value pairs...

5 comments:

moleboy said...

While I don't use RegExp, I really should, especially as I seem to be doing a great deal of string parsing. I've gotten handy with INSTR and SUBSTR etc, but yeah, RegExp would be better.

I find that I end up often doing the character by character thing when I do VBA work, primarily because I don't really know VBA, but I need to use it a great deal (annoying, yes). But I'm getting better there too.

This kind of char by char is the parsing equivilent to RBAR.
;)

oraclenerd said...

@moleboy

Had to look up RBAR, is that a SQL Serverism?

Even on the odd occasion I use Java I go straight for StringTokenizer (I've even used a StreamTokenizer as well). For some reason I knew, or was aware, of these already solved problems and never ventured down the char by char path.

moleboy said...

Dunno the derivation of RBAR. I first stumbled into it when my brother (also a database geek and significantly smarter than I am...at least with theory) was talking about some insane update he had to do to 15 million rows and he caught himself writing a pl/sql procedure to do it RBAR instead of a sql statement.

It was actually an interesting conversation weighing the pros and cons of the "slow but easy to write and support" PL/SQL method and the "complex but fast and efficient" SQL update.

blogje said...

The complex, but efficent sql to do this would be:

select substr( tmp, 1, instr( tmp || sep, sep ) - 1) split
from ( select substr( url, instr( sep || url, sep, 1, level ) ) tmp
, sep
from ( select 'TESTING=YES&&BOLLOCKS=SOMETHING&&testing=DF' url
, '&&' sep
from dual
)
connect by level <= length( url ) - length( replace( url, sep, '' ) ) + 1
)

Anonymous said...

Hi ,

Really need some help on parsing URL string with Oracle Regular Expression.

The new_request_page_str needs to be broken up like this:

Example 1:
https://www23.fs.ml.com/marketdata/marketdataisapilib.dll

Result 1:
Web_Page_Str: https://www23.fs.ml.com/marketdata/marketdataisapilib.dll
Domain_Name: fs.ml.com
Website_name: www23
Resource_Path: /marketdata/marketdataisapilib.dll
Directory_1_name: /marketdata
Directory_2_name: null
.
.
.

Directory_10_name: null
Resource_name: marketdataisapilib.dll
Resource_type: dll

Example 2:
https://www23.fs.ml.com/marketdata/allquote.js

Results 2:
Web_Page_Str: https://www23.fs.ml.com/marketdata/allquote.js
Domain_Name: fs.ml.com
Website_name: www23
Resource_Path: /marketdata/allquote.js
Directory_1_name: /marketdata
Directory_2_name: null
.
.
.

Directory_10_name: null
Resource_name: allquote.js
Resource_type: js

Example 3:
https://www23.fs.ml.com/images/downarrow.gif

Result 3:
Web_Page_Str: https://www23.fs.ml.com/images/downarrow.gif
Domain_Name: fs.ml.com
Website_name: www23
Resource_Path: /images/downarrow.gif
Directory_1_name: /images
Directory_2_name: null
.
.
.

Directory_10_name: null
Resource_name: downarrow.gif
Resource_type: gif