ORACLENERD twitter/oraclenerd view chet justice's profile on LinkedIn feed oraclenerd t-shirts Unemployment Clock:

  Logging, Debugging, Instrumentation and Profiling
It started with this tweet from Cary Millsap [company | book | blog | twitter ]



That took me to this article on his blog, Multitasking: Productivity Killer.

I then began to look through his blog history and one in particular caught my attention, On the Usefulness of Software Instrumentation, from February of this year. I left a comment:
I'm having a conceptual crisis.

Profiling. Instrumenting. Debugging. Logging.

What's the difference? What's the same?

I have one AOL friend that says adding lines to code is debug code (meaning bad), while others say it is instrumentation (meaning good). Now I've got profiling to worry about? Come on!

Can you help a poor guy out? Maybe do a post on it? If you already have, point me to it? Anything? Help? :)
To which he responded:
Hmm, good one. I don't have enough time left today to work on it before close of business, but let's chat about it via email. That ought to help you out, and then I'll be glad to keep thinking about it and write something up. Write me via http://method-r.com/contact-us. —Cary
What follows is the conversation between Cary and I, reprinted with here with his permission.

Me: July 15, 2009, 10:09 PM
As directed...;)

I'm a big believer in Instrumentation, until recently, when I got all confused (shocker).

I was having a conversation with an old colleague...I asked about Instrumentation, and he indicated that Instrumentation created the ability to essentially monitor your system so that operational folks could adjust (more hardware, bandwidth, etc). My clumsy example, logins typically take 1ms, now they are taking 10ms. He agreed, sort of. The best I could figure out, from his definition, was that Instrumentation of software was like SixSigma is for process. (I only have a vague understanding of SixSigma).

Then I stumbled on your article. BTW, I think this is the Tom Kyte article you were referring to.

**Logging, I get. I think. Logging allows you to capture exceptions or other "bad" things and then, optionally, notify someone if it is severe enough.
**Debugging. It's a superset of Logging, I think. You are technically logging, but you may or may not have the notifications or ability to take immediate action.
**Instrumentation. ?
**Profiling. ?

I used to think I knew what Instrumentation was, but now I'm not so sure.

I think I'm fairly good at designing, then writing code for Oracle database applications. I consider security in my design and code stages. I consider future enhancements in design and code. I try to think of it all. I've even used debug statements liberally applied to all of my code. I want to do it the right way though. So my company brings you in to take a look, an audit, a pre-emptive strike if you will, and you say, "Man, this is some pretty awesome code here! Where's my check?."

I would love to see you write something up, 5 different articles, one on each topic then put them all together in a final post. Just kidding of course. Whatever you do write though, I'll be there asking questions!
Cary: July 15, 2009 11:36 PM
Here are some thoughts. I'd like you to press back with your questions and comments before I consider putting this up as a blogpost.

I think a good definition for debug code is, it's the code you put into your application when you'd rather have your code tell you what it's doing than use a proper debugger (with breakpoints and watches and all that). It's the printf("Here I am\n") statements that you throw into your code to help you figure out why it's not doing what you thought it was going to do. I use both debuggers and debug code. I use debug code anywhere I feel like a program I'm shipping to someone may need to report some information to me after it's deployed. Actually, I take that back...

When debug code gets fancy and becomes a permanent fixture in your code, like "if ($debug =~ /[gex]/i) { sprintf("%.6f entering %s\n", gettimeofday(), caller()); }", that's when I think it's really logging code. I think of logging as a permanent part of your code that adds some kind of operational feature to your code. It's more for the person managing how your code runs than it is for the person who runs it as a business function. Logs answer questions like, "Show me when the different phases of last night's process ran." The Oracle alert file is a log. Things that log4j prints out are logs.

Instrumentation, I'd say, is logging with a time-oriented purpose. I think people use the word pretty loosely to mean anything they want it to mean, so maybe instrumentation includes everything that's not a core function of the application. But, I like the word instrumentation. It puts a picture in my mind of a Starrett micrometer measuring the thickness of some Swiss watch part to within 0.0001 of an inch. It seems to me like the kind of a word that should mean something more careful than just "all code that's not core function code." When I say instrumentation, I mean logging with a focus on where time has gone.

Now, profile code is a different matter entirely; it is a very carefully restricted subset of instrumentation. A profile is a spanning, non-overlapping account of response time. Spanning means that the sum of the elements has to be greater than or equal to the response time the profile describes. Non-overlapping means that the sum of the elements has to be less than or equal to the response time the profile describes. In other words, a profile has to describe exactly a response time event; not less time than that, nor more. (For tasks with parallelized subtasks in them, the overall task has a profile whose interesting part requires drilling down into the individual serial [but executed in parallel] subtasks.) So, profile code is a type of instrumentation that is logging code that explains time in such a manner that you can create a profile from it.

The 10046 trace code path inside the Oracle Database barely qualifies as profile code, because a profile can be constructed from its output data, but I say "barely," because it's very, very difficult. It may be easy to get it 80% right, like examples in Jeff's and my Optimizing Oracle Performance book explain, but the last 20% is a huge chore. That last 20% is most of why the Method R Profiler (which was called the Hotsos Profiler when I wrote about it in OOP) has so many man-years of labor in it. Good profiling code is difficult to write (especially because one of your constraints needs to be to minimize the measurement intrusion effect of the profiling code itself), but it's like any other functional requirement. First, you have to know what you want, which requires trial-and-error before you get the design right. Then it becomes a software development project for a specific function, which you build just like you'd build anything else.

Helpful?
Me: July 16, 2009 12:11 PM
OK, Debugging and Logging are much more clear now.

You said:

It seems to me like the kind of a word that should mean something more careful than just "all code that's not core function code."

Would you define core-function code? Would you want to Instrument everything? Or is that the piece I am missing?

Also:

Instrumentation, I'd say, is logging with a time-oriented purpose.

and then

When I say instrumentation, I mean logging with a focus on where time has gone.

I'm not sure I'm following you with the time piece (pun intended). If I have Debug code which has now become Logging code, the table (or file, etc) would also capture the current date and time (or timestamp) right? You could use that to get metrics and such. Or am I missing something again?

On Profiling:

(especially because one of your constraints needs to be to minimize the measurement intrusion effect of the profiling code itself)

Kind of like introducing a TV crew or video camera? Where people act different because the camera is there? Isn't that an anthropology concept as well?

Profiling code lives outside of the code? Is that the concept? You have to be at a pretty low level to capture that stuff don't you?

I've got a ways to go.

If I didn't say it, I truly appreciate you taking the time for this.
Cary: July 16, 2009 11:11 AM
Inline below...

Cary

On Thu, Jul 16, 2009 at 11:11 AM, chet justice wrote:
I'm going to take a gander first, without help.

OK, Debugging and Logging are much more clear now.

You said:

It seems to me like the kind of a word that should mean something more careful than just "all code that's not core function code."

Would you define core-function code? Would you want to Instrument everything? Or is that the piece I am missing?


I would define "core-function code" loosely as the stuff that non-technical people talk about when they describe what your code does.

There are engineering trade-offs at every turn. Oracle example: We have 10046 tracing, which turns on instrumentation for every dbcall and oscall executed by the Oracle kernel. Some people shy away from using this instrumentation level because they're worried about the measurement intrusion effect. I believe that the intrusion is almost always well worth the cost. Oracle also provides 10200 tracing, which turns on instrumentation for every consistent read executed by the Oracle kernel. Now there is some measurement intrusion, because it makes the kernel emit a long line of trace data every time it executes the CR code path. Oracle provides both (and a couple hundred more, too), because sometimes you need both. But, thankfully, they gave us the ability to turn them on/off independently, so the run-time operational manager gets to make the trade-off decision about how much measurement intrusion he's willing to live with.

So, in response to your "instrument everything?" question, it's an artistic answer with an economic constraint: You need to instrument everything that will be in your business's best interest to have instrumented when the code goes out. The economic constraint is that you don't want to instrument something you'll never need to activate in the field: that's a waste of development time that could have been better allocated to writing some other function. I say artistic because you can't always know what parts of your code are going to need the instrumentation. You have to feel your way through that, and sometimes events over time will very much surprise you.

Also:

Instrumentation, I'd say, is logging with a time-oriented purpose.

and then

When I say instrumentation, I mean logging with a focus on where time has gone.

I'm not sure I'm following you with the time piece (pun intended). If I have Debug code which has now become Logging code, the table (or file, etc) would also capture the current date and time (or timestamp) right? You could use that to get metrics and such. Or am I missing something again?


No, I think you're fine there. What I really need to say, I guess, is that the word instrumentation might have so many different loose definitions that it might be a good word to stay away from, except as a general category marker.

On Profiling:

(especially because one of your constraints needs to be to minimize the measurement intrusion effect of the profiling code itself)

Kind of like introducing a TV crew or video camera? Where people act different because the camera is there? Isn't that an anthropology concept as well?


Yes. Except with machines, the intrusion effect is much easier to measure and predict. There's the occasional race condition problem that the presence of instrumentation can make more difficult to reproduce, so it's still an interesting, potentially nonlinear effect. (I'm talking about the times when running without tracing takes 10 seconds. Then you turn trace on, and your thing never takes more than 2 seconds. That happens sometimes.)

Profiling code lives outside of the code? Is that the concept? You have to be at a pretty low level to capture that stuff don't you?

There are two parts to profiling: (1) the emission of detailed event timings, and (2) the aggregation of those timings into something usable for an analyst. Part #1 is what happens when you run a.out after doing a "cc -pg hello.c". The execution writes out detailed timing data to a file called gmon.out. Then part #2 happens when you run gprof upon gmon.out to produce a readable, analyzable report. With Oracle, part #1 is turning on trace; part #2 is running something like our Profiler. Part #1 happens inside your code. Part #2 happens outside your code.

I've got a ways to go.

If I didn't say it, I truly appreciate you taking the time for this.


You're welcome. I learn by explaining; I'm glad you consider me a good person to learn with this week.
Me: July 16, 2009 2:30 PM
Ah ha! I think I have it now.

Profiling is using Logging (possibly Debugging) information that has been turned on (possible performance impact, which is at least measurable), reading that Data and then turning it into Information (report)?

When you say artistic, do you mean something like processing transactions (not database transactions but financial)? That would be a good candidate wouldn't it?

Cary: July 16, 2009 3:10 PM
What I mean by artistic is that there aren't any hard-and-fast rules that a programmer can use to determine what code paths to instrument, and which not to. And there are no hard-and-fast rules describing how to segment the instrumentation; e.g., 10046 and 10200 should be separately controlled? or should both instrumentation types be combined into one switch? I use the "art" word to denote tasks for which there's no deterministic list of steps that an inexperienced, untalented person can follow to do just as good a job as an experienced, talented one.

You might like to read the paper that I've attached.
I'm still looking for the public link to this paper. I will update this post when I find it.


Me: July 21, 2009 12:47 AM
Interesting read. I haven't quite wrapped my head around, but I've only read it once.

So to sum it up (sort of), I, as a PL/SQL Developer can't really do that much in regards to profiling? When you talk about profiling, you're talking at the kernel level right? If not...then I'm lost! :)

Cary: July 21, 2009 9:03 AM
Well, as a PL/SQL programmer, you have 10046 data, which can be assembled into a profile. And you have DBMS_PROFILER, which gives you a response time profile partitioned by PL/SQL line number. As Jeff and I suggested in our book, the analysis process there is:

1. Create a response time profile with 10046 data.
2. If the majority of your time is spent in an EXEC call upon a PL/SQL block (as opposed to a PARSE, EXEC, or FETCH of a SQL statement within the block), then use DBMS_PROFILER to figure out which line(s) of PL/SQL are causing your performance problem.
Me: July 21, 2009 11:36 AM
duh...dbms_profiler. I am a bit slow on occasion.

Very cool stuff. Part of my lack of understanding is that I've never really used a debugger or profiler. I think my first exposure to it (looking back now) was just a couple of months ago and JRockit, the ability to "look" and see what was running in the JVM. I've done Java programming before, but I never advanced past the barely intermediate stage. PL/SQL doesn't (or hasn't to my knowledge until now) made things easy...from a command line anyway.

So I profiled a simple session calling DBMS_LOCK.SLEEP. Pretty cool stuff.
SYS@TESTING>BEGIN
2 dbms_profiler.start_profiler
3 ( run_comment => 'JUST A TEST' );
4
5 dbms_lock.sleep(60);
6
7 dbms_profiler.stop_profiler;
8 END;
9 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.01
SYS@TESTING>SELECT * FROM plsql_profiler_runs;

RUNID RELATED_RUN RUN_OWNER RUN_DATE RUN_COMMENT
---------- ----------- ------------ --------- ------------
0 SYS 21-JUL-09 JUST A TEST

1 row selected.

SYS@TESTING>SELECT * FROM plsql_profiler_units;

RUNID UNIT_NUMBER UNIT_TYPE UNIT_OWNER UNIT_NAME
---------- ----------- ----------------- ------------- -----------
1 1 PACKAGE BODY SYS DBMS_PROFILER
1 2 ANONYMOUS BLOCK
1 3 PACKAGE SPEC SYS DBMS_LOCK
1 4 PACKAGE BODY SYS DBMS_LOCK

4 rows selected.

SYS@TESTING>SELECT * FROM plsql_profiler_data;

RUNID UNIT_NUMBER LINE# TOTAL_OCCUR TOTAL_TIME MIN_TIME MAX_TIME SPARE1 SPARE2
---------- ----------- ---------- ----------- ---------- ---------- ---------- ---------- ----------
1 1 1 0 0 0 0
1 1 8 0 0 0 0
1 1 16 0 1069 1069 1069
1 1 32 0 654 654 654
1 1 41 0 0 0 0
1 1 49 0 0 0 0
1 1 57 0 0 0 0
You can find more information on DBMS_PROFILER here.

I have not yet completely wrapped my mind around this yet. I think it might be in part due to the fact that I have never really programmed in another language like C or Java (well, a little) where you run the debugger and profiler against your code.

I don't know if this conversation makes me look like a complete dolt either...I don't really care if it does. Cary graciously made himself available to answer my questions and tolerated my...inexperience in this area. It's no wonder he is held in such high regard in the Oracle community. Not only does he know his stuff, he's more than willing to share it.

Hopefully some of you will get something out of this conversation. I'm still hoping that Cary will do his own posts in the future, but I am grateful he let me share this with you.

Labels: , , , ,

 
  What I Learned Today
I was deploying a project to our UAT environment and having problems getting the script to finish. We have these weird timeout issues that, if connected, will boot you out after 30 minutes or so. Some believe it's a firewall issue and others believe it's a security control.

This particular piece of code would bump me out after about an hour, 2/3rds of the way through.

I tried to spruce up the anonymous block by instrumenting the code with dbms_application_info.set_session_longops, removing unnecessary commits and replacing loops with set based operations.

I get it all prettied up and kick it off then open up a JDeveloper User-Defined Report to watch it run. The SQL is (from the linked post above):
SELECT 
username,
sid,
serial#,
TO_CHAR( start_time, 'MM/DD/YYYY HH24:MI:SS' ) start_ti,
time_remaining rem,
elapsed_seconds ela,
ROUND( ( sofar / REPLACE( totalwork, 0, 1 ) ) * 100, 2 ) per,
sofar,
totalwork work,
message,
target_desc
FROM v$session_longops
WHERE start_time >= SYSDATE - 1
ORDER BY start_time DESC
But I don't see anything returning. WTF?

Go talk to Mr. DBA and asked him if he could help me monitor. He opens up OEM, but that's not what I was thinking. I was thinking of the Toad --> DBA --> Session browser. Finally I talk him into that and there's the code running. WTF?

"Why can't I see it? I'm using v$session_longops too."

"Ah yes young Padawan, this is RAC, you must use the GV$ tables."

Labels:

 
  Instrumentation: DEBUG/LOGGING
In my previous entry on instrumenting code I detailed the way in which you could use the DBMS_APPLICATION_INFO.SET_SESSION_LONGOPS procedure to help monitor your long running code.

In this one, I will detail my home grown version of Tom Kyte's debug routine. I do know that others have similar code but can't seem to find them right now.

You can find the source code here.

Contents:
2 tables
2 sequences
1 package
1 procedure
1 build file

debug_tab allows you to turn the debugging on and off. debug_details_tab will store each line that you write to the debug routine when turned on.

Here's an example of it in practice:

CREATE OR REPLACE
FUNCTION get_age_in_months( p_id IN NUMBER ) RETURN NUMBER
IS
l_age_in_months INTEGER;
BEGIN
--instrumentation calls
debug( 'GET_AGE_IN_MONTHS' );
debug( 'P_ID: ' || p_id );
debug( 'select value into variable' );

SELECT age_in_months
INTO l_age_in_months
FROM people_tab
WHERE id = p_id;

debug( 'L_AGE_IN_MONTHS: ' || l_age_in_months );

RETURN l_age_in_months;

EXCEPTION
WHEN no_data_found THEN
debug( 'no data found' );
RETURN NULL;
END get_age_in_months;
/
show errors


I mentioned in the previous article that I had had difficulty grasping this concept initially. I think once I related it to DBMS_OUTPUT.PUT_LINE it became much more clear to me.

This simple debug routine has helped me tremendously in the last year or two that I have used it. Especially when you get nested levels of logic. It gets very hard to keep track of where you are, but with this you can run your procedure or function and then issue a SELECT against the debug_details_tab and see what was happening when.

I even began using this in my APEX applications. I would preface each line with "APEX: " and then write whatever was necessary so that I could step through the various pieces of code. It became crucial when I was doing validation on a table of records in a collection...oh so much easier.

On large systems this will generate a lot of data. I definitely would not put this inside a loop doing some sort of batch processing, but it is helpful to find where things fail out.

It can certainly be improved on. I basically took the idea of DBMS_OUTPUT.PUT_LINE and wrote it to a table, nothing fancy there. Mr. Kyte mentions writing it to the file system as well. Since I don't have ready access to the database file system, this was the easiest.

Make your life easier and use a debug/logging routine in your code. No longer will you have to comment, the debug statements should do it for you!

Labels: , , , , , , ,

 
  Instrumentation: DBMS_APPLICATION_INFO
Instrumentation has something that I have come to rely on fairly heavily. I believe I first read about it on asktom, but the one that really spurred me on was this post on instrumentation on his personal blog.

Initially, I couldn't really wrap my head around instrumentation. I don't know why it was so difficult; I had a similar problem with sessions when I first started my career. I look back now and it just seems so obvious.

Now that I am doing datawarehouse work, nothing is fast. Fast to me is now one hour to load 30 or 40 million records. No more split second queries for me.

We currently use no tools. It's straight PL/SQL. Instrumentation of the code is ideal. Actually, it's more instrumentation to aid monitoring. The tool most easily used is provided by Oracle in the DBMS_APPLICATION_INFO package.

There are three subprograms that I use most, SET_MODULE, SET_ACTION and most importantly SET_SESSION_LONGOPS. I hadn't started using it until this year, I mainly stuck to the first two. SET_SESSION_LONGOPS is now part of my procedure/function template I've created in JDeveloper.

What it allows you to do is set a row in the v$session_longops view (I know it's not actually putting the row in the view...it's the underlying table, but I digress). You can then monitor how your job is doing.

Here's an example:

dbms_application_info.set_session_longops
( rindex => g_index,
slno => g_slno,
op_name => 'GETTING MEMBER DATA',
sofar => 0,
totalwork => l_table.COUNT + 1,
target_desc => 'GETTING MEMBER DATA' );

g_index and g_slno are global variables in the package. l_table is a PL/SQL TABLE OF VARCHAR2.

Now you can monitor the progress of your job in v$session_longops!

Here's the query I use:

SELECT
username,
sid,
serial#,
TO_CHAR( start_time, 'MM/DD/YYYY HH24:MI:SS' ) start_ti,
time_remaining rem,
elapsed_seconds ela,
ROUND( ( sofar / REPLACE( totalwork, 0, 1 ) ) * 100, 2 ) per,
sofar,
totalwork work,
message,
target_desc
FROM v$session_longops
WHERE start_time >= SYSDATE - 1
ORDER BY start_time DESC


Now you too can sit for hours and watch your job move incrementally forward!

But seriously, it does help tremendously to know where a job is at. You can further use the SET_MODULE and SET_ACTION calls to see a specific point in the processing (inside a loop).

Here's the code in context:


PROCEDURE get_member_data
IS
l_exists INTEGER;
TYPE table_of_lobs IS TABLE OF VARCHAR2(3);
l_table TABLE_OF_LOBS := TABLE_OF_LOBS( 'COM', 'ORG' );
l_count INTEGER := 0;
BEGIN
--check to see if there is enrollment data, if not, move on
SELECT COUNT(*)
INTO l_exists
FROM members
WHERE rownum < 2;

IF l_exists = 1 THEN--data exists, truncate and reload

g_index := dbms_application_info.set_session_longops_nohint;

EXECUTE IMMEDIATE 'TRUNCATE TABLE member_stg';

g_audit_key := p_audit.begin_load
( p_targettable => 'MEMBER_STG',
p_loadsource => 'MEMBER_SOURCE',
p_loadstatus => 'PRE',
p_loadprogram => 'GET_MEMBER_DATA',
p_commenttext => 'INSERT' );

dbms_application_info.set_session_longops
( rindex => g_index,
slno => g_slno,
op_name => 'GETTING MEMBERS',
sofar => 0,
totalwork => l_table.COUNT + 1,
target_desc => 'GETTING MEMBERS' );

FOR i IN 1..l_table.COUNT LOOP
l_count := l_count + 1;

INSERT INTO member_stg
SELECT *
FROM members;

g_total_rows_affected := g_total_rows_affected + sql%rowcount;

COMMIT;

dbms_application_info.set_session_longops
( rindex => g_index,
slno => g_slno,
op_name => 'GETTING MEMBERS',
sofar => l_count,
totalwork => l_table.COUNT + 1,
target_desc => 'GETTING MEMBERS' );

END LOOP;

p_audit.end_load
( p_auditkey => g_audit_key,
p_loadstatus => 'SUC',
p_rowsuccess => g_total_rows_affected );

gather_table_stats
( p_tablename => 'MEMBER_STG',
p_schemaname => 'MYHOME' );

dbms_application_info.set_session_longops
( rindex => g_index,
slno => g_slno,
op_name => 'GETTING MEMBERS',
sofar => l_count + 1,
totalwork => l_table.COUNT + 1,
target_desc => 'GETTING MEMBERS' );

END IF;

EXCEPTION
WHEN others THEN
p_audit.failed_load
( p_auditkey => g_audit_key,
p_comments => SQLCODE || ' ' || SQLERRM );
RAISE;

END get_member_data;

Labels: , , , , ,

 


Guest Authors

How To

Popular

Previous Posts

Code Projects

Archives
August 2007 / September 2007 / October 2007 / November 2007 / December 2007 / January 2008 / February 2008 / March 2008 / April 2008 / May 2008 / June 2008 / July 2008 / August 2008 / September 2008 / October 2008 / November 2008 / December 2008 / January 2009 / February 2009 / March 2009 / April 2009 / May 2009 / June 2009 / July 2009 / August 2009 / September 2009 / October 2009 / November 2009 / December 2009 / January 2010 / February 2010 / March 2010 /


Aggregated by OraNA