Sunday, March 1, 2009

ORA-08177 - II

So my p.i.t.a. DBA won't link back to me, I told him it would be helpful to have both of our perspective's linked up since we're working on the same problem, I'm gonna link to him.

Anyway, last week he was able to pull all the SQL statements from memory that were operating under the SERIALIZABLE isolation level. Pretty cool stuff. You can find his post here.

Also, on Thursday night I ran 4 simultaneous tests of approximately 500 transactions each. While running I found a whole bunch of table locks. Of those tables, I found any ENABLED triggers and DISABLED them. After disabling the 6 triggers, the 2000 or so transactions were successfully completed in about 60 seconds, and the locks on the tables were very brief.

Right before I went to bed I realized though that I had run them in the default mode of READ COMMITTED. I almost got up and re-ran but decided to wait until morning.

Disabling the 6 triggers again and this time doing
before each script fired, they all broke with ORA-08177 within seconds.

On my previous post pertaining to this error, Gary Myers left a well thought out comment that I wanted to highlight here:
The basic nugget is that every statement in the transaction happens as if it all happened at the same instant (ie the SCN that the tansaction started). In the default mode, each SQL runs as of the SCN the statement starts executing.

Wouldn't expect it on a straight SELECT (maybe a SELECT ...FOR UPDATE). The error crops up when you try to get a record in current mode (ie you want to update/delete it) and it has changed since the start of the transaction. In default mode, the statement would lock until that other transaction completed, then restart as of a new SCN. Since in SERIALIZABLE mode it can't use a new SCN, it just falls over.

The 'solutions' are (1) make the transactions complete faster so they are less likely to overlap and (2) grab all the locks you need as early as possible. Shouldn't be trigger related unless they have autonomous transactions.

"usage of serializable isolation level is a design error that leads to non-scalable applications"
Don't agree with the word 'error'. It is a design choice. When you have a transaction of multiple statements, in default mode, the statements do not see a consistent picture of the data - even though each statement is individually consistent. There's a balance between consistency and concurrency. SERIALIZABLE moves that balance a bit more towards consistency.
I've also mapped out the process that produces the error and it ain't pretty. Most points are spawned off by the various triggers. I believe there is an underlying design flaw here as this particular process should be short and sweet. Some of the triggers firing are maintenance of post-processing data which would probably be served better by an asynchronous process taking care of that.


Clever Idea Widgetry said...

Dependency on serializable transactions that involves an application of more than a handful of concurrent users is most definitely a a design error. You might as well use flat files to store your data...

oraclenerd said...

You are quick!

Not sure if it's evident, but I'm fairly tired of the discussion. Perhaps I'm just not ready for the in-depth conversation the debate entails.

However, my gut instinct tells me that, READ COMMITTED being the natural isolation level, READ COMMITTED is what we should use. At least for the time being.

I also believe that there is a fundamental flaw in the design in that it's trying to do too much in this single logical transaction. Some of the code is encapsulated in packaged procedure calls, but the trigger stuff is, well, it's crazy.

Clever Idea Widgetry said...

I blew my FIRST! opportunity. Damn it.