Tuesday, September 29, 2009

Database Cleanup: Metrics

Before my current refactor/redesign goes to production, I would like to capture some metrics. I'm fairly limited in what I can actually do (i.e. I can't use DBMS_PROFILER in production).

So far, this is what I have come up with:

1. Lines of Code (LOC) - I don't believe this is necessarily a reflection of good or bad code. For instance, I can take that 2 line INSERT statement and turn it into 20 lines.

Was
INSERT INTO my_table(id, col1, col2, col3, col4, col5, col6, col7, col8, col9 )
VALUES ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 );
Is
INSERT INTO my_table
( id,
col1,
col2,
col3,
col4,
col5,
col6,
col7,
col8,
col9 )
VALUES
( 1,
2,
3,
4,
5,
6,
7,
8,
9,
10 );
That's a pretty sane example. The 2 line version isn't all that bad, but it does run off the page. The point I am trying to make is that "cleaning" up can actually add more lines to your code.

In my opinion, when more than one person is going to support the code, readability is a nice thing. Whether or not you like my style, it is (more) readable. So LOC is not necessarily a great metric, but it can give you an idea which way you are going (after it has been properly formatted anyway).

2. COMMITs - Many argue that there should (almost) never be commits (an exception is logging with the AUTONOMOUS_TRANSACTION pragma) in the database. The calling application should perform the commits. Unfortunately that general rule is not always followed. I've added it to my list of metrics because it is pertinent to our environment. Of course I have gone to great pains to make sure that the removal of one commit will not impact the entire system...that possibility does exist when you have commits everywhere.

3. Text - This was a real stretch. What is the size of the individual procedure, package or function? I wouldn't have considered it (I never have until now), but I was desperate to define something...anything. How do you determine that?
SELECT name, type, SUM( LENGTH( text ) ) t
FROM dba_source
WHERE owner = 'MY_OWNER'
AND name = 'MY_NAME';


4. Dependencies - Also known, to me, as modular code. Why have 200 INSERT statements into a single table when you could wrap that up into a single procedure and call that? If you add a column, you'll still have to go through and fix all those occurences (if it's not defaulted to something). But if you remove a column from that table, it can easily be hidden from the calling code, thus you only have to change it in one place. Of course you wouldn't want to leave it there forever, but it can be done piece-meal, bit by bit as you work on affected parts.

Have you ever thought about this before? What kind of metrics would you suggest? I know mine are a bit of a stretch...so please share.

3 comments:

Oracle Mike said...

Hey Chet,

Would this work for you for #3:

select
type,name,count(0) cnt
,sum(length(text)) len_text
,round(avg(length(text)),0) avg_text_len
from
dba_source group by type,name

Oracle Mike said...

OK...some thoughts on things not related to metrics you've already thought about:
dbname (e.g., report title)
dbsize
number of each kind of object (table, procedure, type, etc.)
size of each object (rows, columns for table, rows of text for types, procedures, views, etc.)
list of each type under it's "type" heading ("type" heading being table, procedure, type, etc.) alphabetically listed
list of each type whose name is related to another object of same type (yeah, algorithm required to match partial name matches, e.g., TBL_EMPLOYEE vs. TBL_EMPLOYEE_NAME or PRC_GET_SALARY vs. PRC_EMP_SALARY) so you can chase down dead wood/duplicate data/code
Your "commit" idea brings up another idea, too. Let's say you'd like to know how the code overall is structured. In that case, I might target keywords like EXECUTE IMMEDIATE, INSERT, UPDATE, DELETE, TRUNCATE, CURSOR, DBMS_OUTPUT, AUTONOMOUS_TRANSACTION, RAISE, COMMIT ... OK, you get the idea. Whatever keyword you're interested in, search the code base of the db for each keyword, list the procedure, function, type, package, etc. the keyword occurs in with the count of instances in that procedure, etc. that the keyword occurs. Of course, you could also list the line of code in drill down fashion where the keyword occurs if you're in the mood to get fancy.
Last time I made an entry here in your blog I think I really missed the mark on what the subject matter was (occurred to me the next day!). Something abour %ROWTYPE. Hope I'm closer this time.

Boneist said...

Personally, I don't think relying on metrics is all that useful. I agree that code should be readable - after all, it's written once, amended infrequently but read lots.

A good site for refactoring ideas is http://www.refactoring.com/catalog/index.html (although it's not PL/SQL based, the ideas can be transferred).

If you have access to Toad, that has CodeXpert, which can be used to go through the code and does pop out a number at the end, as well as various warnings, etc. I don't use it all that much - it's usually painfully obvious when code is badly written!

Nothing beats a decent code review...