Tuesday, October 18, 2011

KScope + DevOps

Last year I had the pleasure of getting the Sunday Symposium together for KScope 11, this year, I have completed my takeover of the Database track by becoming the track lead.

I thought this was the best job ever, then I was attacked Nancy Kerrigan style by my handlers.

All that said, I think I've gathered a pretty good group of people to help review and select the abstracts for next year's conference (San Antonio, TX).

There will be 4 sub-tracks this year:
- Design/Data Modeling
- Maintenance (Performance, Tuning, Upgrades)
- (Dev)Operations

The one I am most excited about is the (Dev)Operations sub-track, aka, DevOps.

What is DevOps?

I'm glad you asked..

"DevOps" is an emerging set of principles, methods and practices for communication, collaboration and integration between software development (application/software engineering) and IT operations (systems administration/infrastructure) professionals.[1] It has developed in response to the emerging understanding of the interdependence and importance of both the development and operations disciplines in meeting an organization's goal of rapidly producing software products and services.

I am not necessarily a fan of the movement, but I am a fan of the principles behind it.

Every developer has a story about working with an evil DBA. LIkewise, every DBA has a story about some application that went to production where they were left completely out of the process.

But it is more than just a simple, "Can't we all just get along?" plea, this is about creating better software and streamlining processes.

My personal experience has been one of woeful cooperation, at any level. Our thought, our hope, is that this well help give other Oracle professionals better ideas on how to start down this road.

If you are interested in this topic, sign up. If you want to present on this (or any other) topic, register here.

Tuesday, October 11, 2011

Good DBA, Bad DBA, Deadlock

By Enrique Aviles

A few days ago a fellow DBA asked me to review an email he received from a developer. In the email, the developer explained his application was affected by database errors and asked us to check the attached file for details. The error was a database deadlock. Attached to the email was the trace file Oracle generates whenever a deadlock occurs in the database. I don’t see deadlocks regularly so I hardly ever need to dissect one of those trace files. The trace file contained the following information:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-002b0006-00000968        23     223     X             25      35           X
TX-002c0007-00000b13        25      35     X             23     223           X
session 223: DID 0001-0017-000163D0     session 35: DID 0001-0019-00002809
session 35: DID 0001-0019-00002809      session 223: DID 0001-0017-000163D0
Rows waited on:
  Session 223: obj - rowid = 0001FBA6 - AAAfumAAFAAAWikAAA
  (dictionary objn - 129958, file - 5, block - 92324, slot - 0)
  Session 35: obj - rowid = 0001FBA6 - AAAfumAAFAAAWikAAB
  (dictionary objn - 129958, file - 5, block - 92324, slot - 1)
Session 223 holds an exclusive lock on a row and session 35 holds another exclusive lock on a different row. Session 35 wants to lock session 235 row and vice versa. This clearly shows there is a deadlock. Immediately following this section the SQLs involved in the deadlock are shown in the trace file. I was expecting to see two different queries but the current session and the “other” session executed exactly the same SQL:
  NAME=:2, FIRST_NAME=:3, 
where DATA_SOURCE_ID=:11
The fact that the same UPDATE was executed by both sessions against the same table confused me for a moment. For some reason I wanted to see two different tables but found the same table in both UPDATEs. I started thinking one session updated a row and another session wanted to update the same row. On that scenario the second session would just wait for the first session to either commit or rollback the update. Once that happens the exclusive lock on the row is released and the UPDATE from the second session goes through. How can that cause a deadlock? As you can tell, I didn’t read the trace file close enough. The rowids above are different so both sessions were trying to update different rows. Once again, I rushed to faulty reasoning thinking two sessions updating two different rows should not cause a deadlock. Clearly, Oracle is able to handle two sessions updating two different rows with ease. They are completely independent transactions so there shouldn’t be a deadlock. Remember, I don’t analyze deadlock trace files on a daily basis so that’s my defense for not being able to immediately explain what caused the deadlock. After a few moments trying to imagine what could have caused the deadlock I was able to see the full picture. The first session updates row 1, the second session updates row 2. The first session tries to update row 2 and the second session tries to update row 1. This sequence causes a deadlock. In order to validate my reasoning I opened two SQL*Plus sessions and ran the following:
On session #1:
On session #2:
On session #1:
SQL> UPDATE T SET N = 20 WHERE N = 2; (this one blocks because it’s locked by session #2)
On session #2
SQL> UPDATE T SET N = 10 WHERE N = 1; (this one causes a deadlock)
After a few seconds the database reported a deadlock on session #1. As a result, the second update on session #1 was lost. After issuing a commit on both sessions I noticed the table didn’t contain two rows with 1 and 2 but 10 and 20. No updates were lost because both sessions tried to update the table with the same values. The same would happen if the UPDATEs on PERSON_TAB contained the same values on all columns. If PHONE_NUMBER was different on both sessions one of them would be lost as a result of the deadlock. With this information on hand my colleague replied the email with a detailed explanation as to what caused the deadlock and provided the small case scenario to help the developer reproduce the issue. We also supplied the SQL showing the table involved in the deadlock.

We acted as good DBAs (we think so) because we took the time to examine the trace file, compose a detailed explanation, and supply steps on how to reproduce the issue.

What would a bad DBA do if faced with the same request? The bad DBA would open the trace file and copy the following section on a reply email:
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL.
The email would close with a simple “fix your code”.