ORACLENERD: PL/SQL: Split URL Parameters

Tuesday, June 16, 2009

PL/SQL: Split URL Parameters

I've always had this phobia of anything to do with strings. Might be because I find it hard to think that way...and by "that way" I have no idea what I'm talking about.

Whenever I come across this problem, my first instinct is to go through every character and build a string, something like this:

DECLARE
  l_string VARCHAR2(100) := 'THIS_LITTLE_STRING';
  l_character VARCHAR2(1);
BEGIN
  FOR i IN 1..LENGTH( l_string ) LOOP
    l_character := l_character || SUBSTR( l_string, i, 1 );
  END LOOP;
END;
/

Of course that means I have to go through the loop at least 18 times. How could I loop less?

Well, I would need to know the delimiter first, in this case, it will be the underscore character:

DECLARE
  l_string VARCHAR2(100) := 'THIS_LITTLE_STRING';
  l_character VARCHAR2(1);
  l_delimiter VARCHAR2(1) := '_';
BEGIN
...

OK, that helps. So what can I do with that?

DECLARE
  l_string VARCHAR2(100) := 'THIS_LITTLE_STRING';
  l_new_string VARCHAR2(100) := l_string;
  l_token VARCHAR2(30);
  l_character VARCHAR2(1);
  l_delimiter VARCHAR2(1) := '_';
BEGIN
  LOOP
    EXIT WHEN l_new_string IS NULL;
    l_token := SUBSTR( l_new_string, 1, INSTR( l_new_string, '_', 1 ) );
...

OK, now I remember what I always get confused. There seems to be this endless stream of SUBSTR and INSTR involved. It's difficult to follow...for me anyway. So here's what I've come up with for now. I am aware of regular expressions and the DBMS_UTILITY.COMMA_TO_TABLE procedures, but for some reason, I like to reinvent the wheel. No, not really, I just find it challenging. More challenging would be for me to start using regular expressions...Anyway, here goes my attempt at splitting a URL string:

DECLARE
  TYPE my_table IS TABLE OF VARCHAR2(100);
  l_table MY_TABLE := MY_TABLE();
  l_string VARCHAR2(1000) := 'TESTING=YES&&BOLLOCKS=SOMETHING&&&testing=DF';
  l_keyvalue VARCHAR2(100);
  l_delimiter VARCHAR2(5) := '&&';
  l_delimiter_length INTEGER := LENGTH( l_delimiter );
  l_counter INTEGER := 0;
  l_new VARCHAR2(4000);
  l_start INTEGER := 1;
  l_end INTEGER := 0;
BEGIN
  IF SUBSTR( l_string, -1, 1 ) <> l_delimiter THEN
    l_string := l_string || l_delimiter;
  END IF;

  l_new := l_string;

  LOOP
    l_counter := l_counter + 1;
    l_end := INSTR( l_new, l_delimiter, 1 );
    l_keyvalue := SUBSTR( l_new, 1, l_end - 1 );
    EXIT WHEN l_keyvalue IS NULL;
    l_table.EXTEND(1);
    l_table(l_counter) := l_keyvalue;
    l_start := l_start + ( l_end + ( l_delimiter_length - 1 ) );
    l_new := SUBSTR( l_string, l_start );
  END LOOP;
  p( 'table counter:  ' || l_table.COUNT );
  FOR I IN 1..l_table.COUNT LOOP
    p( 'string:  ' || l_table(i) );
  END LOOP;
END;
/

CJUSTICE@TESTING>@T
table counter:  3
string:  TESTING=YES
string:  BOLLOCKS=SOMETHING
string:  &testing=DF

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.03

Voila! My own URL String Tokenizer.

Now I just have to write something that will split up the key/value pairs...

5 comments:

moleboy said...: While I don't use RegExp, I really should, especially as I seem to be doing a great deal of string parsing. I've gotten handy with INSTR and SUBSTR etc, but yeah, RegExp would be better.

I find that I end up often doing the character by character thing when I do VBA work, primarily because I don't really know VBA, but I need to use it a great deal (annoying, yes). But I'm getting better there too.

This kind of char by char is the parsing equivilent to RBAR.
;); June 16, 2009 at 10:35 AM
oraclenerd said...: @moleboy

Had to look up RBAR, is that a SQL Serverism?

Even on the odd occasion I use Java I go straight for StringTokenizer (I've even used a StreamTokenizer as well). For some reason I knew, or was aware, of these already solved problems and never ventured down the char by char path.; June 16, 2009 at 6:04 PM
moleboy said...: Dunno the derivation of RBAR. I first stumbled into it when my brother (also a database geek and significantly smarter than I am...at least with theory) was talking about some insane update he had to do to 15 million rows and he caught himself writing a pl/sql procedure to do it RBAR instead of a sql statement.

It was actually an interesting conversation weighing the pros and cons of the "slow but easy to write and support" PL/SQL method and the "complex but fast and efficient" SQL update.; July 30, 2009 at 8:26 AM
blogje said...: The complex, but efficent sql to do this would be:

select substr( tmp, 1, instr( tmp || sep, sep ) - 1) split
from ( select substr( url, instr( sep || url, sep, 1, level ) ) tmp
, sep
from ( select 'TESTING=YES&&BOLLOCKS=SOMETHING&&testing=DF' url
, '&&' sep
from dual
)
connect by level <= length( url ) - length( replace( url, sep, '' ) ) + 1
); November 3, 2009 at 4:20 PM
Anonymous said...: Hi ,

Really need some help on parsing URL string with Oracle Regular Expression.

The new_request_page_str needs to be broken up like this:

Example 1:
https://www23.fs.ml.com/marketdata/marketdataisapilib.dll

Result 1:
Web_Page_Str: https://www23.fs.ml.com/marketdata/marketdataisapilib.dll
Domain_Name: fs.ml.com
Website_name: www23
Resource_Path: /marketdata/marketdataisapilib.dll
Directory_1_name: /marketdata
Directory_2_name: null
.
.
.

Directory_10_name: null
Resource_name: marketdataisapilib.dll
Resource_type: dll

Example 2:
https://www23.fs.ml.com/marketdata/allquote.js

Results 2:
Web_Page_Str: https://www23.fs.ml.com/marketdata/allquote.js
Domain_Name: fs.ml.com
Website_name: www23
Resource_Path: /marketdata/allquote.js
Directory_1_name: /marketdata
Directory_2_name: null
.
.
.

Directory_10_name: null
Resource_name: allquote.js
Resource_type: js

Example 3:
https://www23.fs.ml.com/images/downarrow.gif

Result 3:
Web_Page_Str: https://www23.fs.ml.com/images/downarrow.gif
Domain_Name: fs.ml.com
Website_name: www23
Resource_Path: /images/downarrow.gif
Directory_1_name: /images
Directory_2_name: null
.
.
.

Directory_10_name: null
Resource_name: downarrow.gif
Resource_type: gif; July 19, 2010 at 6:57 PM