ORACLENERD: dba

Showing posts with label dba. Show all posts

Monday, August 26, 2013

DBA or Developer?

I've always considered myself a developer and a LOWER(DBA). I may have recovered perhaps one database and that was just a sandbox, nothing production worthy. I've built out instances for development and testing and I've installed the software a few hundred times, at least. I've done DBA-like duties, but I just don't think of myself that way. I'm a power developer maybe? Whatevs.

I'm sure it would be nearly impossible to come up with One True Definition of The DBA ™. So I won't.

I've read that Tom Kyte does not consider himself a DBA, but I'm not sure most people know that. From Mr. Kyte himself:

At the same conference, I asked Cary Millsap the same question:

I read Cary for years and always assumed he was a DBA. I mean, have you read his papers? Have you read Optimizing Oracle Performance? Performance? That's what DBAs do (or so I used to think)!

It was only after working with him at #kscope11 on the Building Better Software track that I learned otherwise.

Perhaps I'll make this a standard interview question in the future...

Semi-related discussions:

1. Application Developers vs. Database Developers
2. Application Developers vs. Database Developers: Part II

Wednesday, August 1, 2012

Building an OBIEE Test Lab - Part II

So it's taking a little longer than I anticipated. That's a good thing (for me anyway). In Part I, I tried to diagram out what my plans were. My drawing was crude, and remains so. Mike Durran left me a link to the drawing below which is found here in the docs.

Now I'm an ambitious lad, but not that ambitious. This is for fun and amusement, perhaps I'll learn a thing or two along the way.

So how does this qualify as Part II? Well, I finally created the database. (No, I'm not going RAC like the diagram and docs say. Like I said, I'm not completely nutters.) This is an accomplishment for me as it's been quite some time since I've built anything, close to a year. So I'm celebrating by writing.

Oracle Enterprise Linux 64 bit, 5.8
2.6.32-300.32.1.el5uek
Database: 11.2.0.1
Database host: oracle-db
Database SID: TESTING

I used OEL 5.8 because that's the latest version I could find that was certified for 11.2.0.1 (straight from OTN). I was unable to find a single document that listed out the supported OSes, but I'm not that adept at searching through MOS yet (LOWER( DBA )). If you know where it is, please link it up.

This database will serve as the repository home for OBIEE and perhaps something like Identity Manager (OVM, OAM). I'll probably end up using it as a source and target for various projects as well.

Thursday, December 15, 2011

Trace Data: Not Just For DBAs

On Wednesday, I attended Cary Millsap's Mastering Oracle Trace Data class here in Tampa.

Why?

Why would I go? I am working with OBIEE which is about 42 layers above the database...who cares about trace data? Well, performance is usually the number 1 item on everyone's list. Page loads to slow. Report takes to long to run. Whether it's trying to tune the application server (WLS) or figure out why Report A takes so long to run, we do a lot of performance analysis. In most cases, it ends up being the SQL that is run. What I mean by the SQL is that it's usually bringing back quite a few records. I've seen people try to pull back millions of records and then pivot all that data putting a nice load on the BI Server and hogging all the resources from everyone else.

On occasion though, there are other things that are going on (with the SQL) that we can't pinpoint.

Recently we had to back out a production implementation because one of the load processes seemed to just hang, threatening to slow down a larger process.

I asked the DBAs why.

Crickets.

Shouldn't that be an answer a DBA provides?

Disk? Network? CPU? Memory? Which one?

Crickets. (I didn't ask those exact questions, I think I said, "Reads? Writes? Network? Load?")

That is just one of the reasons I wanted to attend Mr. Millsap's class. That, and I've heard he's well regarded and does a pretty decent job presenting. OK, I admit it, I just want to show the DBA up. There, said it.

I really shouldn't have to though. It's supposed to be a partnership. They don't know much about OBIEE, so I help them there. I expect help in things like this.

Why? Part II

If you are a developer, understanding trace data will make you better. You'll no longer have to guess, you'll know.

Of course there's what I hinted at above, being able to go to your DBA(s) and prove something. No better feeling in the world.

How?

MR Trace is by far the easiest. It's integrated with SQL Developer. It's a great place to start.

MR Tools - For the more advanced professional. Mostly geared towards the DBA type, but incredibly useful to developers as well. It includes:

- mrskew - your trace file profiler and data miner
- mrls: your trace file lister
- mrcallrm: your trace file correction fluid
- mrtrim: your trace file tim calculator
- mrtrimfix: your trace file time value fixer

Method R Profiler:

The classic tool that started it all, the Method R Profiler is software that makes simple work of knowing exactly why your application consumes the response time it does. With minimal training, a Method R Profiler user can—in just minutes—identify the root cause of an Oracle-based application performance problem, propose sensible solutions to it, and predict the end-user response time impact of each proposed solution.

There are of course other products, check them out here.

Ask Mr. Millsap to come to your town. Try out MR Trace. You won't be sorry you did.

Tuesday, October 11, 2011

Good DBA, Bad DBA, Deadlock

By Enrique Aviles

A few days ago a fellow DBA asked me to review an email he received from a developer. In the email, the developer explained his application was affected by database errors and asked us to check the attached file for details. The error was a database deadlock. Attached to the email was the trace file Oracle generates whenever a deadlock occurs in the database. I don’t see deadlocks regularly so I hardly ever need to dissect one of those trace files. The trace file contained the following information:

Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-002b0006-00000968        23     223     X             25      35           X
TX-002c0007-00000b13        25      35     X             23     223           X
 
session 223: DID 0001-0017-000163D0     session 35: DID 0001-0019-00002809
session 35: DID 0001-0019-00002809      session 223: DID 0001-0017-000163D0
 
Rows waited on:
  Session 223: obj - rowid = 0001FBA6 - AAAfumAAFAAAWikAAA
  (dictionary objn - 129958, file - 5, block - 92324, slot - 0)
  Session 35: obj - rowid = 0001FBA6 - AAAfumAAFAAAWikAAB
  (dictionary objn - 129958, file - 5, block - 92324, slot - 1)

Session 223 holds an exclusive lock on a row and session 35 holds another exclusive lock on a different row. Session 35 wants to lock session 235 row and vice versa. This clearly shows there is a deadlock. Immediately following this section the SQLs involved in the deadlock are shown in the trace file. I was expecting to see two different queries but the current session and the “other” session executed exactly the same SQL:

UPDATE PERSON_TAB SET PERSON_ID=:1, 
  NAME=:2, FIRST_NAME=:3, 
  MIDDLE_NAME=:4, 
  LAST_NAME=:5, 
  DATE_OF_BIRTH=:6, 
  PHONE_NUMBER=:7, 
  ALT_PHONE_NUMBER=:8, 
  FAX_NUMBER=:9, 
  E_MAIL=:10 
where DATA_SOURCE_ID=:11

The fact that the same UPDATE was executed by both sessions against the same table confused me for a moment. For some reason I wanted to see two different tables but found the same table in both UPDATEs. I started thinking one session updated a row and another session wanted to update the same row. On that scenario the second session would just wait for the first session to either commit or rollback the update. Once that happens the exclusive lock on the row is released and the UPDATE from the second session goes through. How can that cause a deadlock? As you can tell, I didn’t read the trace file close enough. The rowids above are different so both sessions were trying to update different rows. Once again, I rushed to faulty reasoning thinking two sessions updating two different rows should not cause a deadlock. Clearly, Oracle is able to handle two sessions updating two different rows with ease. They are completely independent transactions so there shouldn’t be a deadlock. Remember, I don’t analyze deadlock trace files on a daily basis so that’s my defense for not being able to immediately explain what caused the deadlock. After a few moments trying to imagine what could have caused the deadlock I was able to see the full picture. The first session updates row 1, the second session updates row 2. The first session tries to update row 2 and the second session tries to update row 1. This sequence causes a deadlock. In order to validate my reasoning I opened two SQL*Plus sessions and ran the following:

On session #1:
SQL> CREATE TABLE T (N NUMBER);
SQL> INSERT INTO T VALUES (1);
SQL> INSERT INTO T VALUES (2);
SQL> COMMIT;
SQL> UPDATE T SET N = 10 WHERE N = 1;

On session #2:

SQL> UPDATE T SET N = 20 WHERE N = 2;

On session #1:

SQL> UPDATE T SET N = 20 WHERE N = 2; (this one blocks because it’s locked by session #2)

On session #2

SQL> UPDATE T SET N = 10 WHERE N = 1; (this one causes a deadlock)

After a few seconds the database reported a deadlock on session #1. As a result, the second update on session #1 was lost. After issuing a commit on both sessions I noticed the table didn’t contain two rows with 1 and 2 but 10 and 20. No updates were lost because both sessions tried to update the table with the same values. The same would happen if the UPDATEs on PERSON_TAB contained the same values on all columns. If PHONE_NUMBER was different on both sessions one of them would be lost as a result of the deadlock. With this information on hand my colleague replied the email with a detailed explanation as to what caused the deadlock and provided the small case scenario to help the developer reproduce the issue. We also supplied the SQL showing the table involved in the deadlock.

We acted as good DBAs (we think so) because we took the time to examine the trace file, compose a detailed explanation, and supply steps on how to reproduce the issue.

What would a bad DBA do if faced with the same request? The bad DBA would open the trace file and copy the following section on a reply email:

The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL.

The email would close with a simple “fix your code”.

Tuesday, September 13, 2011

Pythian Acquires "Dude Who Gets Stuff Done"

Pythian, the world's leading remote DBA service company, has announced the (non-hostile) acquisition of John Piwowar.

Wait, Pythian didn't announce it, officially anyway...

Source

Source

Maybe it was John?

...Starting today, I begin the "on-boarding" process at Pythian, as a member of one of their ERP teams.
In a way, it feels as much like starting school as it does a new job. This move represents my biggest career challenge to date, and I
look forward to learning a ton (and contributing a ton) in the company of a lot of smart, talented people. Since I'm going to be packing
my brain full of new things over the next 3 weeks, I expect that the only blog activity here will be from pre-scheduled posts, drafts that
I'd been putting off finishing until recently. Once I'm up to speed on the new system, you'll hopefully be hearing more from me on the
Pythian blog!

I think it's obvious from John's post how excited he is. I am excited for him too. John's an awesome dude, sherpa (for me), protector (of me, from myself), and DBA.

Find John in the following locations:

- Twitter
- Blog
- LinkedIn
- oraclenerd

OK, this didn't turn out as funny as I had intended, whatever.

Great move for both parties. Can't wait to see what John does with the vast resources available to him now.

Wednesday, September 7, 2011

DROP DATABASE;

I decided to blow away one of my newly created databases. Instead of using the DBCA, I decided to try it manually.

DROP DATABASE;

DROP DATABASE
*
ERROR at line 1:
ORA-01586: database must be mounted EXCLUSIVE and not open for this operation

Not it.

SQL> SHUTDOWN;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> STARTUP MOUNT EXCLUSIVE;
ORACLE instance started.

Total System Global Area 1048059904 bytes
Fixed Size                  2219992 bytes
Variable Size             608174120 bytes
Database Buffers          432013312 bytes
Redo Buffers                5652480 bytes
Database mounted.
SQL> DROP DATABASE;
DROP DATABASE
*
ERROR at line 1:
ORA-12719: operation requires database is in RESTRICTED mode

Ugh.

SQL> STARTUP NOMOUNT;
ORACLE instance started.

Total System Global Area 1048059904 bytes
Fixed Size                  2219992 bytes
Variable Size             608174120 bytes
Database Buffers          432013312 bytes
Redo Buffers                5652480 bytes
SQL> DROP DATABASE;
DROP DATABASE
*
ERROR at line 1:
ORA-01507: database not mounted


SQL> ALTER DATABASE MOUNT;

Database altered.

SQL> DROP DATABASE;
DROP DATABASE
*
ERROR at line 1:
ORA-12719: operation requires database is in RESTRICTED mode

Not it. Here's the SQL Reference. Maybe that RESTRICTED reference means something...

SQL> STARTUP MOUNT RESTRICTED;
ORACLE instance started.

Total System Global Area 1048059904 bytes
Fixed Size                  2219992 bytes
Variable Size             608174120 bytes
Database Buffers          432013312 bytes
Redo Buffers                5652480 bytes
ORA-01504: database name 'RESTRICTED' does not match parameter db_name
'TESTING'

Barnacles. Perhaps the Administrator's Guide has something.

SQL> STARTUP MOUNT TESTING;
ORACLE instance started.

Total System Global Area 1048059904 bytes
Fixed Size                  2219992 bytes
Variable Size             608174120 bytes
Database Buffers          432013312 bytes
Redo Buffers                5652480 bytes
Database mounted.
SQL> DROP DATABASE;
DROP DATABASE
*
ERROR at line 1:
ORA-12719: operation requires database is in RESTRICTED mode


SQL> ALTER DATABASE RESTRICTED;
ALTER DATABASE RESTRICTED
                        *
ERROR at line 1:
ORA-02231: missing or invalid option to ALTER DATABASE


SQL> ALTER DATABASE READ ONLY;
ALTER DATABASE READ ONLY
                    *
ERROR at line 1:
ORA-02231: missing or invalid option to ALTER DATABASE

Last try

SQL> STARTUP NOMOUNT RESTRICT;
ORACLE instance started.

Total System Global Area 1048059904 bytes
Fixed Size                  2219992 bytes
Variable Size             608174120 bytes
Database Buffers          432013312 bytes
Redo Buffers                5652480 bytes
SQL> DROP DATABASE;
DROP DATABASE
*
ERROR at line 1:
ORA-01507: database not mounted


SQL> ALTER DATABASE MOUNT;

Database altered.

SQL> DROP DATABASE;

Database dropped.

Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

Voila!

Tuesday, August 2, 2011

Mastering Oracle Trace Data

From The Twitter Machine today:

Which takes you here.

I had the pleasure of meeting Mr. Millsap in person last year at OOW. Since then, I've seen him present at KScope and more recently at our local user group.

I can't tell you how much I haven't processed yet.

Anyway, he had mentioned to me that he was considering this one day Masterclass event, but hadn't figured out all the details. He has now.

So it looks like he only has an event set up for Dallas/Ft. Worth, I want him to come back and visit Florida. We're often left out of these types of events (everyone neglects America's...ok, I won't go there).

Why?
I have an obsession about proving things...I'm tired of getting beat down by people who say "prove it."

With that in mind, I'm starting a petition of sorts, to lure Mr. Millsap into coming back to Florida.

If you're interested, email me to let me know you're interested in having this event down here in Florida. If enough people are interested, I'll bug Mr. Millsap until he agrees to come back.

Update
When Mr. Millsap was in town, after he presented, he was showing us some material from this class he was thinking about.

If you read Jake's musings over at the AppsLab, you'll know he (and Rich and Anthony) have gone to Google IO for the past 2 years. This year they came away with, among other things, a Samsung Galaxy Tab 10.1 and a Chromebook (which shipped very recently).

Mr. Millsap is doing something similar. Not only do you get to learn all about trace data, you get the tools to do so:

- Mastering Oracle Trace Data booklet, by Cary Millsap
- A limited license for MR Trace, the zero-click trace file collector for Oracle SQL Developer
- A limited license for MR Tools, the premiere software for trace file mining, management, and manipulation
- A Method R “I can help you trace it” T-shirt

So sign up for the DFW class, or let me know so we can bring Mr. Millsap back to Florida.

Tuesday, July 26, 2011

Managing Database Entries (tnsnames.ora)

For the past 18 months or so I've been arguing that if you would just manage your tnsnames.ora file, things would be much easier. I'm talking about through your various SDLC environments, DEV --> QA --> PROD.

This is true for tools like OBIEE.

In OBIEE, you can use the OCI client and specify your tnsname entry for a particular connection. Of course you'll still have to change the password for the connection, but you'll never be at risk of accidentally connecting to the wrong database.

Guess what? My worst case scenario occurred.

Here's what I have suggested; use a generic name for your connection. Let's say TESTING. How it works right now is we have one for each environment, TESTING_DEV, TESTING_QA and TESTING_PROD.

Each server has a tnsnames.ora file with every single connection.

For OBIEE, that means not only changing the password for each environment, but changing the DNS (TNS) entry as well.

Only bad things can come from this.

(I realize there are much more sophisticated ways of managing this, OID (I believe) for instance, but that's outside the scope).

So what happened?

Last week I built out an Informatica PowerCenter server which connected to the Dev database. Following that, I wrote up the instructions, including the particulars of my installation.

The next person up the chain, installing the QA software, read it literally.

Guess what happened?

The Dev server got borked because they used the Dev connection information and all the configuration stuff (technical term) got messed up.

Guess what else?

I get to rebuild the dev machine.

I would contend that the QA server should only have an entry for the QA database...that way this type of thing would never occur. If we had used a generic name for the database, say TESTING, I wouldn't be working tonight.

Something to think about when you end up managing, not only multiple servers, but multiple "platforms" as well.

Sunday, July 17, 2011

SOUG: Cary Millsap

We (SOUG) have the fortunate pleasure of hosting Mr. Cary Millsap.

Perhaps you've heard of him?

The topic is Thinking Clearly About Performance.

Here's the abstract:

Creating high-performance as an attribute of complex software is extremely difficult business for developers, technology administrators, architects, system analysts, and project anagers. However, by understanding some fundamental principles, performance problem solving and prevention can be made far simpler and more reliable. This paper describes those principles, linking them together in a coherent journey covering the goals, the terms, the tools, and the decisions that you need to maximize your application’s chance of having a long, productive, high-performance life. Examples in this paper touch upon Oracle experiences, but the scope of the paper is not restricted to Oracle products.

Here's Mr. Millsap's Bio (note the inclusion of the word "teacher", my favorite description):

Cary Millsap / @CaryMillsap / cary.millsap@method-r.com

Cary Millsap is a public speaker, author, teacher, software designer and developer, entrepreneur, and software technology advisor. In 2008, he founded Method R Corporation, a company dedicated to making people's application software run faster and more efficiently. Mr. Millsap's technical papers are quoted in many Oracle books, in Wikipedia, in blogs all over the world, and in dozens of conference presentations each month. His blog reaches thousands of people each month. He has presented at hundreds of public and private events around the world. He wrote the book "Optimizing Oracle Performance" (O'Reilly 2003), for which he and co-author Jeff Holt were named Oracle Magazine's 2004 Authors of the Year.

RSVP here.

Thursday, July 14, 2011

How To: Run a Silent Install with Response File

Many of you have done this many, many times.

Tonight was my first.

Environment:
HP-UX, 64bit, Itanium (oops).

I'm using cygwin to ssh into the server. Thanks to John Scott for helping me figure out how to run the GUI installer, well, pointing me in the right direction.

(At one point I asked The Twitter Machine for help, offering T-Shirts in trade for services rendered. Martin Berger, as usual, offered to help. Also, Fuad Arshad, TedC and others that I can't remember right now, I'm sorry).

I tried running xclock and got an error:

$ xclock
Error: Can't open display:

Ah, when I run the ssh command, I have to add a switch/flag thingy in there.

$ ssh -X oraclenerd@hpux

I found that solution here.

After restarting my session, I was able to successfully run xclock. OK, let's try it with the installer.

Wait a second...this line is too slow. 10 minutes for the GUI console to render. Don't ask about trying to do anything, like specify the directory or something silly like that.

I was back to the console method.

However, the first time I did it successfully, I got the Instant Client, which I didn't want.

How do I specify that which installation type I want while using the console? I was not prompted for anything.

Reading around...response file!

Read about response files here.

Like many things that intimidate me, I never bothered to look into it...but it was easy, too easy. That's about the point I kicked myself for being intimidated by that fancy DBA stuff.

So I grabbed the sample file out of response directory, edited it to suit my needs, and I was off and running again.

$ ./runInstaller -silent -responseFile /tmp/client/response/client_response.rsp
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 415 MB.   Actual 772 MB    Passed
Checking swap space: must be greater than 150 MB.   Actual 8192 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2011-07-14_10-22-31PM. Please wait ...

$ You can find the log of this install session at:
 /opt/oracle/oraInventory/logs/installActions2011-07-14_10-22-31PM.log
The following configuration scripts need to be executed as the "root" user. 
 #!/bin/sh 
 #Root scripts to run

/opt/oracle/product/11.2.0/client/root.sh
To execute the configuration scripts:
         1. Open a terminal window 
         2. Log in as "root" 
         3. Run the scripts 
         4. Return to this window and hit "Enter" key to continue 

Successfully Setup Software.

Or so I thought...now I need to run the root.sh as root? I'm not root, I'm not in the SUDOers file (or whatever it's called). It took 3 days for me to get the necessary privs to write to /opt/oracle

So now I have to wait for someone with those privs to complete the install.

But hey, I learned something new today.

Sunday, July 3, 2011

KScope 11: Top Tips to Get Optimal SQL Execution All the Time

This was my final session of the conference.

Prior to this one was Jean-Pierre Djicks on Managing Parallel Execution without Tuning in 11g Release 2. Conveniently, it was in the same room. Especially convenient given that it was in the same room the day after the Queen Mary event. I did want to see JP's other talk, Speed Up Your Data Warehouse with in-memory Processing, but it was just too much effort for me.

Besides, I had never seen Maria speak and I was trying to broaden my horizons. I wish I had seen her speak years ago...would have made life a lot easier.

Maria is not on Twitter but you can find her writing at the Oracle Optimizer blog.

The optimizer subject has always been a bit intimidating to me. I don't know why, it just has. I've mostly glossed over it.

No longer.

Despite my state (post Queen Mary event), a lot of things came together. The optimizer really isn't magic. Surprise!

The importance of statistics, which I've always known but never truly understood, was made clear. Extended statistics? Neat. Histograms, explained (better).

Awesome stuff...and really not as magical as you might think. Take a gander at the Optimizer group's Explain the Explain Plan for more details.

Friday, June 3, 2011

Developer Access to V$ Views in Production

I love The Twitter Machine.

.

Assuming you made it all the way through...how freaking awesome. Got a question? The Twitter Machine and a lot of really smart people have the answer.

Wednesday, May 4, 2011

The &DBA

Pronounced, Ampersand DBA.

I'm hearby naming all my future DBAs that meet certain criteria, &DBA.

Over the years, at about a million different organizations, I have met many DBAs. Many, at first glance, fit the DBA stereotype; cranky, condescending, arrogant, etc. After further review, many of those traits still apply. Some however turn out to be the opposite; they just put on a game face for the new guy or gal because they've been through this before...probably to be disappointed again that the developer will 1, leave them out of design discussions 2, throw bad SQL over the wall, 3, not use bind variables, ever and 4, well, all the other bad things we developers do.

The type that I have least encountered are the DBAs who actively seek out and engage developers. Hey, can I help you with that? Hey, you aren't using bind variables, here are the reasons you should. They know that in the long run, the better the developer, the easier their job will be.

I love that aspect of my job. I like helping others learn to do something better or more efficiently. I like to encourage others to download and install the database. But that's not the point of this post...

I needed a name for those DBAs who are 1, less qualified than I am to be a DBA (short list, I am under no illusions), 2, are wholly incompetent and 3, are jerks.

After a recent incident, as relayed by others, the &DBA was born.

What does the & mean?

In SQL*Plus, the & is a special character. If you run scripts you can parameterize them with &COLUMN_VALUE or something, right?

That's great for scripts.

It can go horribly wrong with code. At least the first time you try to load code via SQL*Plus. Here's the original

CREATE OR REPLACE
FUNCTION foo RETURN VARCHAR2
AS
  l_retval VARCHAR2(10);
  l_value VARCHAR2(30) := 'Barnes & Noble';
BEGIN
  SELECT 'IT WORKS!'
  INTO l_retval
  FROM dual
  WHERE 'Barnes & Noble' = l_value;
  
  RETURN l_retval;
  
END foo;
/
show errors

and here's what happens when I run it:

CJUSTICE@NO_CLUE>CREATE OR REPLACE
FUNCTION foo RETURN VARCHAR2
  2  AS
  3    l_retval VARCHAR2(10);
  4    l_value VARCHAR2(30) := 'Barnes & Noble';
  5  BEGIN
  6    SELECT 'IT WORKS!'
  7    INTO l_retval
  8    FROM dual
  9    WHERE 'Barnes & Noble' = l_value;
 10
 11    RETURN l_retval;
 12
 13  END foo;
 14  /
Enter value for noble: show errors
old   3:   l_value VARCHAR2(30) := 'Barnes & Noble';
new   3:   l_value VARCHAR2(30) := 'Barnes show errors';
Enter value for noble: testing
old   8:   WHERE 'Barnes & Noble' = l_value;
new   8:   WHERE 'Barnes testing' = l_value;

Warning: Function created with compilation errors.

This is how it is stored in the database:

FUNCTION foo RETURN VARCHAR2
AS
  l_retval VARCHAR2(10);
  l_value VARCHAR2(30) := 'Barnes show errors';
BEGIN
  SELECT 'IT WORKS!'
  INTO l_retval
  FROM dual
  WHERE 'Barnes testing' = l_value;

  RETURN l_retval;

END foo;

12 rows selected.

Yeah, that won't work.

To fix this, you simply set an attribute/directive? in your SQL*Plus session, SET DEFINE OFF (I believe there is another, but this is the one I use). Like this:

CJUSTICE@NO_CLUE>SET DEFINE OFF
CJUSTICE@NO_CLUE>
CJUSTICE@NO_CLUE>CREATE OR REPLACE
  2  FUNCTION foo RETURN VARCHAR2
  3  AS
  4    l_retval VARCHAR2(10);
  5    l_value VARCHAR2(30) := 'Barnes & Noble';
  6  BEGIN
  7    SELECT 'IT WORKS!'
  8    INTO l_retval
  9    FROM dual
 10    WHERE 'Barnes & Noble' = l_value;
 11
 12    RETURN l_retval;
 13
 14  END foo;
 15  /

Function created.

Elapsed: 00:00:00.22
CJUSTICE@NO_CLUE>show errors
No errors.
CJUSTICE@NO_CLUE>
CJUSTICE@NO_CLUE>SET DEFINE ON

All is now well in the world.

Now this might be vexing on your first couple of weeks on the job, but you quickly pick it up.

This is where the name comes from...

SET DEFINE OFF
&DBA
SET DEFINE ON

said that this was a known Oracle bug.

Really.

Wednesday, April 20, 2011

Yet Another Slow Query

by Enrique Aviles

As the database performance contact I get to troubleshoot slow queries. Typically when users experience a slow application screen an incident is submitted and the issue ends up on my plate. Recently, I had to investigate a report that was taking a little over 10 minutes to generate results. Although the application is an OLTP system some screens allow users to generate a report that can be sent to the printer. Normally these reports take a few seconds to appear on the screen but this one was way over the application’s SLA requirements.

Below is a sample query construct that captures the syntax of the actual query that was executed by the application. After examining execution plans and generating extended SQL trace I identified the parts highlighted in red as the ones contributing to the query’s poor response time.

SELECT T1.COLUMN_1,
       T1.COLUMN_2,
       V1.COLUMN_1,
       V1.COLUMN_2,
       T2.COLUMN_1,
       T2.COLUMN_2,
       V2.COLUMN_1,
       V2.COLUMN_2,
       CV.USAGE,
       .
       .
       .
       24 additional columns
       .
       .
       .
FROM
       TABLE_1 T1,
       TABLE_2 T2,
       TABLE_3 T3,
       TABLE_4 T4,
       VIEW_1 V1,
       VIEW_2 V2,
       COMPLEX_VIEW CV
       .
       .
       An additional mix of 7 tables and views
       .
       .
WHERE T1.COLUMN_4 = T2.COLUMN_5
AND T2.COLUMN_7   = T3.COLUMN8
.
.
.
AND
CV.PARAMETER_ID = T3.ID
AND CV.PART_ID  = T4.ID
AND CV.LOG_DATE = (
                   SELECT MAX(CV2.LOG_DATE)
                   FROM COMPLEX_VIEW CV2
                   WHERE CV2.PART_ID    = CV.PART_ID
                   AND CV2.PARAMETER_ID = CV.PARAMETER_ID
                  )
AND T5.ID = 'ABC123'

The query’s runtime statistics are:

Elapsed: 00:13:13.01

Statistics
----------------------------------------------------------
       4013  recursive calls
       6372  db block gets
   65682217  consistent gets
     858907  physical reads
          0  redo size
       3328  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          3  sorts (memory)
          1  sorts (disk)
          1  rows processed

The query took more than 13 minutes and over 65 million consistent gets to select one row. This is obviously unacceptable so it is clear why users were not happy with that particular report. What could be causing the query to generate so much work?

The complex view is composed of twelve tables and 40 lines in the WHERE clause, five of them OR conditions. I noticed that removing the subquery that gets the maximum LOG_DATE from the complex view helped the main query to complete in just a few seconds. Obviously, results were incorrect without the subquery so I had to figure out a way to preserve the logic that gets the maximum LOG_DATE while having the query complete in a matter of seconds.

Examining the complex view data showed there were no duplicate LOG_DATEs so the MAX aggregate function will always return the absolute maximum LOG_DATE for a given PART_ID/PARAMETER_ID combination. Finding that characteristic in the data led me to consider using a scalar subquery to get the USAGE value from the complex view. In the process I also wanted to select from the complex view in one pass so I decided to use the ROW_NUMBER analytic function to get the maximum LOG_DATE and eliminate the need for a self-join via a correlated subquery. Having devised that plan, I executed the following query to test what would become the scalar subquery:

SELECT USAGE
FROM   (
        SELECT CV.USAGE,
               ROW_NUMBER() OVER (PARTITION BY CV.PART_ID,
                                               CV.PARAMETER_ID
                                  ORDER BY CV.LOG_DATE DESC) RN
        FROM COMPLEX_VIEW CV
        WHERE CV.PARAMETER_ID = 'ABC'
        AND   CV.PART_ID      = 'XYZ'
       )
WHERE RN = 1


Elapsed: 00:00:00.02

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        738  consistent gets
          0  physical reads
          0  redo size
        532  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

Only 0.02 second and 738 consistent gets to select the USAGE value from the complex view! It looks like the plan is coming together. I proceeded to replace the correlated subquery with the scalar subquery so the main query becomes:

SELECT  T1.COLUMN_1,
        T1.COLUMN_2,
        V1.COLUMN_1,
        V1.COLUMN_2,
        T2.COLUMN_1,
        T2.COLUMN_2,
        V2.COLUMN_1,
        V2.COLUMN_2,
        (
         SELECT USAGE
         FROM   (
                 SELECT CV.USAGE,
                        ROW_NUMBER() OVER (PARTITION BY CV.PART_ID,
                                                        CV.PARAMETER_ID
                                           ORDER BY CV.LOG_DATE DESC) RN
                 FROM COMPLEX_VIEW CV
                 WHERE CV.PARAMETER_ID = T3.ID
                 AND   CV.PART_ID      = T4.ID
                )
         WHERE RN = 1
        ) USAGE,
        .
        .
        .
        24 additional columns 
        .
        .
        .       
FROM  TABLE_1 T1,
      TABLE_2 T2,
      TABLE_3 T3,
      TABLE_4 T4,
      VIEW_1  V1,
      VIEW_2  V2,
      .
      .
      An additional mix of 7 tables and views
      .
      .
WHERE T1.COLUMN_4 = T2.COLUMN_5
AND   T2.COLUMN_7 = T3.COLUMN_8
.
.
.
AND   T5.ID = 'ABC123'

Notice the complex view is not part of the FROM section and there are no joins to the complex view in the WHERE clause. I executed the new and improved query and got the following error:

ERROR at line 17:
ORA-00904: "T4"."ID": invalid identifier

The excitement was extinguished for a brief moment while I realized my mistake. How come T4.ID is an invalid identifier when I know ID is a valid column on TABLE_4? The problem is that TABLE_4 is not visible inside the inline view of the scalar subquery. The complex view is two levels deep so I can’t join CV and T4. How can I hide the logic of the scalar subquery in a way that allows me to join the complex view and TABLE_4? A view that implements the core logic of the scalar subquery achieves the desired result so I created the following view:

CREATE OR REPLACE VIEW CURRENT_USAGE
AS
  SELECT USAGE,
         PART_ID,
         PARAMETER_ID
  FROM
    (
     SELECT CV.USAGE,
            CV.PART_ID,
            CV.PARAMETER_ID,
            ROW_NUMBER() OVER (PARTITION BY CV.PART_ID,
                                            CV.PARAMETER_ID
                               ORDER BY CV.LOG_DATE DESC) RN
     FROM COMPLEX_VIEW CV
    )
  WHERE RN = 1;

The main query needs to be modified again to reference the newly created view CURRENT_USAGE so it becomes:

SELECT  T1.COLUMN_1,
        T1.COLUMN_2,
        V1.COLUMN_1,
        V1.COLUMN_2,
        T2.COLUMN_1,
        T2.COLUMN_2,
        V2.COLUMN_1,
        V2.COLUMN_2,
        (
         SELECT USAGE
         FROM   CURRENT_USAGE CU
         WHERE  CU.PARAMETER_ID = T3.ID
         AND    CU.PART_ID      = T4.ID
        )
        USAGE,
        .
        .
        .
        24 additional columns 
        .
        .
        .       
FROM  TABLE_1 T1,
      TABLE_2 T2,
      TABLE_3 T3,
      TABLE_4 T4,
      VIEW_1  V1,
      VIEW_2  V2,
      .
      .
      An additional mix of 7 tables and views
      .
      .
WHERE T1.COLUMN_4 = T2.COLUMN_5
AND   T2.COLUMN_7 = T3.COLUMN_8
.
.
.
AND   T5.ID = 'ABC123'

This time around the only surprise was the following runtime statistics:

Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
      28316  consistent gets
          0  physical reads
          0  redo size
       3328  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          4  sorts (memory)
          0  sorts (disk)
          1  rows processed

The new query returned the same data as the original query while performing a fraction of the work. Although 28,316 consistent gets is a bit high to return one row, this value is quite handsome when compared to the original value of 65 million consistent gets. This translates into a 99.96% improvement. Regarding response time, the original query took 13 minutes to complete while the new query only requires 0.3 second to generate results, also a 99.96% improvement.

Combining the right mix of database objects and SQL features helped me achieve such dramatic improvement. I hope my experience will help you consider creative solutions when faced with challenging SQL performance issues.

Wednesday, March 23, 2011

Fun with Tuning!

It's been an interesting and fun night.

Started, innocently enough, with a tweet (go figure, reason #1042 why Twitter rocks) about my CTAS operation completing.

That resulted in a flurry of activity and some actual learning on my part.

Of course you have the smart-ass (tweet), Matt Topper [@topperge]

I quickly did the math, it was only 4,919 times faster than mine. Though after tonight, I would have a hard time believing anything I say.

My CTAS operation created 102 GB of data in a shade over 7 hours. He did 70 TB per hour. Whatever. Showoff.

I need to back up a little actually. I have been posting these numbers over the last few days. Yesterday, Martin Berger [@martinberx] sees one those tweets (a cry for help?), and follows up via email (he rocks anyway, he sent katezilla a postcard from Vienna last year too).

We've exchanged a few emails, mostly me telling him I have no idea what he's talking about and then me trying to explain what I am talking about. Or something.

Tonight (yesterday?) he asked for an AWR report. I told him I disabled everything via DBCA. He told me I probably didn't, nicely. Then he pointed me to Tim Hall's post on running awrrpt.sql. Following those easy instructions, I ran the report. I guess I didn't turn it off.

So far, Matt's a smart-ass, and Martin rocks.

Then Greg Rahn (@gregrahn) joins.

So I have cruddy disks? (read it again Justice).

"Seems likely that the disk writes are the slow side of the execution. The read side probably faster. Got SQL Monitor report?"

I'm thinking, "...", actually, I wasn't. I was stuck on disk. But I could get him a SQL Dev report.

With that, he gets the SQL ID and tells me to do this:

Oracle 11g: Real-Time SQL Monitoring Using DBMS_SQLTUNE.REPORT_SQL_MONITOR

That's 3 years old. Wow.

BTW, here's the SQL statement I was running:

CREATE TABLE claim
  COMPRESS BASIC
  NOLOGGING
AS
SELECT /*+ PARALLEL( c, 8 ) */
  date_of_service,
  date_of_payment,
  claim_count,
  units,
  amount,
  ...
  ...
  ( CASE
      WHEN application_year IS NULL THEN '9'
      ELSE application_year
    END ) application_year,
  ( CASE
      WHEN accounting_code IS NULL THEN '9'
      ELSE accounting_code
    END ) accounting_code,
  ( CASE
      WHEN claim_form_type IS NULL THEN '9'
      ELSE claim_form_type
    END ) claim_form_type,
  ( CASE
      WHEN diagnosis_code_1 IS NULL THEN '-999'
      ELSE diagnosis_code_1
    END ) diagnosis_code_1
  ...
  ...
FROM claims c;

The night gets even crazier, old friend, Tom Roach (@tomroachoracle) has now ssh'd into my VM. He's doing all kinds of craziness. He walked me through sar, iostat, and a few other tricks. At least now I know how to use the history command so I can replicate.

Meanwhile, Matt is still heckling me and I get an email from Greg after reviewing the SQL Monitor report.

(I'm paraphrasing here)

"Take a look at your CTAS again, anything jump out at you?"

Me: (To myself): "No"
Me: (staring at the email)

"Perhaps you have a parallel hint on the select but not on the table, like this"

CREATE TABLE claim
  COMPRESS BASIC
  NOLOGGING
  PARALLEL 8
AS
SELECT /*+ PARALLEL( c, 8 ) */
  date_of_service,
  date_of_payment,
  claim_count,
  units,
  amount,
  ...
  ...

Wow. Really? How did he do that? Impressive. Cool!

I admit to Greg that I thought he was talking about hardware in his original tweet. He said something pithy. I bowed.

So that information (kinda important huh?) couple with Mr. Roach's look at what was happening using sar told me something...I remember this, CPU wasn't being fully utilized. I can't remember the exact numbers so let's call it 50%. I told him I was about to rerun (it's about 1:30 AM at this point), he suggested upping the DOP to 16 from 8. Sure. I'll do what I'm told.

I reran the statement with the bug fix corrected and upping the DOP on both the table and the SELECT. As I was putting this together, it finished. 2.5 hours. Waaaay better than 7.5 hours. Tolerable for me since I'm not as cool as Matt (who was only on a 1/4 rack).

I learned stuff...mostly about how little I do know. I'll try to write up more notes in the future so I don't forget everything and so no one who helped me will have wasted their time.

Thanks so much to Tom, Martin and Greg, your help is greatly appreciated.

Monday, March 21, 2011

Fun with NFS Mount

This is just an ongoing saga for me. Nothing is easy. I'm pretty sure there is a great quote for that.

After my ORA-27054: NFS file system where the file is created or resides is not mounted with correct options issues last night (this morning), they just keep coming.

After running the mount command and then starting the backup process, I let it go until this morning. When I checked, it was complete. Yeah.

Checking the System Resource Monitor though, showed differently. I had less space than I did when I started (and quickly approaching no space). Further investigation showed that the mount command didn't work, at all. It didn't fail either though, so I assumed (again, I know) all was right in the world.

[root@medicaid oradata]# umount /media/test
umount: /media/test: not mounted

Not good.

[root@medicaid oradata]# mount
/dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/sda1 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw,size=7g)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
data on /media/sf_data type vboxsf (gid=501,rw)
exp on /media/sf_exp type vboxsf (gid=501,rw)

It's not listed. The ls command displayed the files...but they weren't on the share.

So I started to remove items from the mount command ending up with this:

[root@medicaid media]# mount -t nfs oraclenerd:/media/data_2/files/oradata/TESTING /media/test
mount: mount to NFS server 'oraclenerd' failed: Connection refused.

Ah, look at that, an error. Sweet. Where was that before? What did I do wrong?

After a bit of looking around, I found this Ubuntu guide on setting up NFS.

Looks like I was missing the nfs-kernel-server package. NFS = Network File System. I'm slow.

From the introduction of the How To document:

NFS (Network File System) allows you to 'share' a directory located on one networked computer with other computers/devices on that network. The computer 'sharing' the directory is called the server and the computers or devices connecting to that server are called clients. The clients 'mount' the shared directory, it becomes part of their own directory structure.

NFS is perfect for a NAS (Networked Attached Storage) deployment in a Linux/Unix environment. It is a native Linux/Unix protocol as opposed to Samba which uses the SMB protocol developed by Microsoft. The Apple OS has good support for NFS. Windows 7 has some support for NFS.

Sounds like what I am trying to do.

Load the package, configure the export file with the following:

/media/data_2 medicaid(rw,sync,no_subtree_check)

Then run the original mount command.

[root@medicaid media]# mount -o rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0 -t nfs 
oraclenerd:/media/data_2/files/oradata/TESTING /media/test

Mounted.

[root@medicaid media] touch test/text.txt

File created.

ORA-27054: NFS file system where the file is created or resides is not mounted with correct options

All I seem to post are my issues...well, technical issues.

On Friday I encountered ORA-08103: object no longer exists. The result there, I had to re-re-re-re-re-load the files. Once I did that, I realized that the file name had actual meaning...In the table itself was a column called MONTH which was relative to the fiscal year (July - June). Guess where fiscal year was? Yup, the file name. Ugh.

So, as soon as I got the data loaded, I did this:

RMAN>BACKUP DATABASE;

I didn't want to go through the corrupted block issue for a 3rd time. After some more work, moving data around, I checked out the RMAN docs and decided to do perform an incremental backup. Well, I'm still learning, I apparently did the whole thing.

RMAN>BACKUP DATABASE INCREMENTAL LEVEL 0;

OK, now I'm running out of space, I have 500 GB allocated to this VM.

I read through the RMAN docs some more, BACKUP BACKUPSET. Yeah, backing up your backup. Inception.

I wanted to save this to my Shared Folders that now auto-mount (as of VirtualBox 4.0).

RMAN> BACKUP BACKUPSET 4
2>   FORMAT '/media/sf_files/oradata/TESTING/20110320_%u'
3>   DELETE INPUT;

Starting backup at 20-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=13, stamp=746274912, piece=1
channel ORA_DISK_1: starting piece 1 at 20-MAR-11
channel ORA_DISK_1: backup piece /u01/app/oracle/product/11.2.0/dbhome/dbs/0dm7mg30_1_1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/20/2011 21:36:53
ORA-19504: failed to create file "/media/sf_files/oradata/TESTING/20110320_0dm7mg30"
ORA-27054: NFS file system where the file is created or resides is not mounted with correct options
Additional information: 3
ORA-19600: input file is backup piece  (/u01/app/oracle/product/11.2.0/dbhome/dbs/0dm7mg30_1_1)
ORA-19601: output file is backup piece  (/media/sf_files/oradata/TESTING/20110320_0dm7mg30)

Checking the Google Machine took me to Kevin Closson's post Mount Options for Oracle over NFS. It’s All About the Port.. Those options are the following:

rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0

OK, I'm still new to Linux and now my Shared Folders auto-mount, WTF do I do? More Google Machine goodness took me to a post on the OTN Forums with the exact command I needed to run (which is of course, why I am writing this up now...).

[root@medicaid media]# mount -o rw,bg,hard,nointr,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0 -t nfs 
oraclenerd:/media/data_2/files/oradata/TESTING /media/test

See that little trick I picked up? I used oraclenerd:/media/data_2/... instead of relying on the VirtualBox magic. Nice.

No errors, I then checked out the permissions:

[root@medicaid media]# ls -la
total 36
drwxr-xr-x  6 root root   4096 Mar 21 01:37 .
drwxr-xr-x 28 root root   4096 Mar 18 02:13 ..
-rw-r--r--  1 root root      0 Mar 18 02:13 .hal-mtab
drwxrwx---  1 root vboxsf 4096 Mar  4 04:18 sf_data
drwxrwx---  1 root vboxsf 8192 Mar  8 13:59 sf_exp
drwxr-xr-x  1 root vboxsf 4096 Mar 20 21:18 sf_files
drwxr-xr-x  2 root root   4096 Mar 21 01:37 test

OK, let's change those:

[root@medicaid media]# chown oracle:dba test/ -R
[root@medicaid media]# ls -la
total 36
drwxr-xr-x  6 root   root   4096 Mar 21 01:37 .
drwxr-xr-x 28 root   root   4096 Mar 18 02:13 ..
-rw-r--r--  1 root   root      0 Mar 18 02:13 .hal-mtab
drwxrwx---  1 root   vboxsf 4096 Mar  4 04:18 sf_data
drwxrwx---  1 root   vboxsf 8192 Mar  8 13:59 sf_exp
drwxr-xr-x  1 root   vboxsf 4096 Mar 20 21:18 sf_files
drwxr-xr-x  2 oracle dba    4096 Mar 21 01:37 test

Sweet. Now, run the RMAN backup of a backup again:

RMAN> BACKUP BACKUPSET 4
2>   FORMAT '/media/test/2011_%u'
3>   DELETE INPUT;

Starting backup at 21-MAR-11
using channel ORA_DISK_1
channel ORA_DISK_1: input backup set: count=13, stamp=746274912, piece=1
channel ORA_DISK_1: starting piece 1 at 21-MAR-11
channel ORA_DISK_1: backup piece /u01/app/oracle/product/11.2.0/dbhome/dbs/0dm7mg30_1_1

No errors...and it seems to be running fine. If it breaks, I'll update.

Update: 03/21/2011 02:41 AM

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_DISK_1 channel at 03/21/2011 02:39:39
ORA-19502: write error on file "/media/test/2011_0dm7mg30", block number 10173824 (block size=8192)
ORA-27072: File I/O error
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 4
Additional information: 10173824
Additional information: 630784
ORA-19502: write error on file "/media/test/2011_0dm7mg30", block number 10173824 (block size=8192)

Barnacles.

Friday, March 18, 2011

ORA-08103: object no longer exists

From the docs:

ORA-08103: object no longer exists
Cause: The object has been deleted by another user since the operation began, or a prior incomplete recovery restored the database to a point in time during the deletion of the object.
Action: Action: Delete the object if this is the result of an incomplete recovery.

It started innocently enough.

Weird, I can see it in the schema browser.

I can select it from the schema browser and view the table properties.

I can issue a SELECT *

Strangely, the Count Rows just fails silently.

Enough of this GUI crap.

From the server itself, but first, the specs:

Host: Ubuntu 10.10 Desktop
RAM: 24 GB
CPU: Dual Xeon QuadCore something or another
VirtualBox: 4.0.4 r70112

Guest: Oracle Enterprise Linux 5.5 (Carthage)
RAM: 12 GB
Processors: 4
Shared Folders: 3
Guest Additions installed.
Database: 11gR2

Describe the table:

sql>desc eligibility
 Name                    Null?    Type
 ----------------------- -------- ----------------
 FILE_ID                          NUMBER
 RECORD_ID                        NUMBER(38)
 RECIPIENT_ID                     VARCHAR2(12)
 MONTH_OF                         NUMBER(4)
 DISTRICT_CODE                    VARCHAR2(2)
 COUNTY                           VARCHAR2(2)
 ELIGIBILITY_FLAG                 VARCHAR2(1)
 HMO_FLAG                         VARCHAR2(1)
 NURSING_HOME_FLAG                VARCHAR2(1)
 MEDIPASS_FLAG                    VARCHAR2(1)
 AGE                              NUMBER(38)
 PROGRAM_CODE                     VARCHAR2(4)
 GENDER                           VARCHAR2(1)

Good.

SELECT COUNT(*)

sql>SELECT COUNT(*) FROM eligibility;
SELECT COUNT(*) FROM eligibility
                     *
ERROR at line 1:
ORA-08103: object no longer exists

OK.

sql>SELECT * FROM eligibility WHERE rownum < 11;

   FILE_ID  RECORD_ID RECIPIENT_ID   MONTH_OF DI CO E H N M        AGE PROG G
---------- ---------- ------------ ---------- -- -- - - - - ---------- ---- -
        52 1892424743 ACF9ASfEJrVw          1 07 48 6 0 0 0         27 MM P F
        52 1892424744 ACF9ASfEJrVw          2 07 48 6 0 0 0         27 MM P F
        52 1892424745 ACF9ASfEJrVw          3 07 48 6 0 0 0         28 MM P F
        52 1892424746 ACF9ASfEJrVw          4 07 48 6 0 0 0         28 MM P F
        52 1892424747 ACF9ASfEJrVw          5 07 48 1 0 0 0         28 MA R F
        52 1892424748 ACF9ASfEJrVw          6 07 48 1 0 0 0         28 MA R F
        52 1892424749 ACF9ASfEJrVw          7 07 48 1 0 0 0         28 MA R F
        52 1892424750 ACF9ASfEJrVw          8 07 48 1 0 0 0         28 MA R F
        52 1892424751 ACF9ASfEJrVw          9 07 48 1 0 0 0         28 MA R F
        52 1892424752 ACF9ASfEJrVw         10 07 48 1 0 0 0         28 MA R F

10 rows selected.

Hah. You are there, sort of. Try to fool it.

sql>SELECT COUNT(*) FROM ( SELECT * FROM eligibility );
SELECT COUNT(*) FROM ( SELECT * FROM eligibility )
                                     *
ERROR at line 1:
ORA-08103: object no longer exists

Barnacles.

If I remember correctly, this table took about 10 hours to load. Sometime later, I had another corrupted block, fortunately it was on a small table so I dropped it, the tablespace and the associated data file. Next time I restarted the server, I'm told to run fsck. Not good.

I run it with the -c switch. I can get back into it. Now this.

This is a DW type environment. ARCHIVELOG is not turned on. There is no backup.

So, should I drop and reload the dang thing? Or is it somehow recoverable?

Help. Winning answer will get a t-shirt.

Update 03/18/2011 10:46 PM EST
I tried Gary's method from below first, as it seemed the easiest to test out. There was a single bitmap index on the table, so I booted it:

DROP INDEX bmp_fileid_eligibility;

Now the test:

SELECT COUNT(*) FROM eligibility;

Waiting...

ORA-01578: ORACLE data block corrupted (file # 9, block # 2440127)
ORA-01110: data file 9: '/u01/app/oracle/oradata/TESTING/medicaid_01_04.dbf'
01578. 00000 -  "ORACLE data block corrupted (file # %s, block # %s)"
*Cause:    The data block indicated was corrupted, mostly due to software
           errors.
*Action:   Try to restore the segment containing the block indicated. This
           may involve dropping the segment and recreating it. If there
           is a trace file, report the errors in it to your ORACLE
           representative.

Great. Same situation as last time. I ended up rebuilding the entire VM. I've never had these kinds of problems before...perhaps there is something wrong with my system setup or possibly how I setup Linux? I don't know. Sadly, I currently lack the skills necessary to figure it out. More importantly, I lack the time.

Following Martin's suggestion from last time, I have determined that the corrupt block belongs to the table and not the index (duh, I just dropped it and got that error). Next up, Joel's suggestion, CTAS.

Update 03/19/2011 1:46 AM EST
No go with CTAS

Error starting at line 1 in command:
CREATE TABLE elig
  COMPRESS BASIC
  NOLOGGING
AS
SELECT *
FROM eligibility
Error at Command Line:6 Column:5
Error report:
SQL Error: ORA-01578: ORACLE data block corrupted (file # 9, block # 2440127)
ORA-01110: data file 9: '/u01/app/oracle/oradata/TESTING/medicaid_01_04.dbf'
01578. 00000 - "ORACLE data block corrupted (file # %s, block # %s)"
*Cause:     The data block indicated was corrupted, mostly due to software
            errors.
*Action:    Try to restore the segment containing the block indicated. This
            may involve dropping the segment and recreating it. If there
            is a trace file, report the errors in it to your ORACLE
            representative.

Update 03/19/2011 3:46 AM EST
After reviewing my notes, I decided to drop and reload. This wasn't the large table that I have, it was only 10 GB or so of compressed data. I was able to drop the load time by 67% by moving the parallel hint from the INSERT section to the SELECT section. Originally it looked like this:

INSERT /*+ append parallel( e, 8 ) */ INTO claims
SELECT
  col1,
  ...,
  col10
FROM ext_claims;

COMMIT;

Changing it to this sped things up dramatically

INSERT /*+ append */ INTO claims
SELECT /*+ parallel( e, 8 ) */
  col1,
  ...,
  col10
FROM ext_claims;

After it was complete this time...

SQL>SHUTDOWN IMMEDIATE;
SQL>STARTUP MOUNT;
SQL>host

[oracle@medicaid ~]$ rman target=sys/testing@testing

Recovery Manager: Release 11.2.0.1.0 - Production on Sat Mar 19 04:30:46 2011

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

connected to target database: TESTING (DBID=89312249)


RMAN> backup database;

Starting backup at 19-MAR-11
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=91 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
...
...
...
input datafile file number=00001 name=/u01/app/oracle/oradata/TESTING/system01.dbf
input datafile file number=00002 name=/u01/app/oracle/oradata/TESTING/sysaux01.dbf
input datafile file number=00005 name=/u01/app/oracle/oradata/TESTING/example01.dbf
input datafile file number=00003 name=/u01/app/oracle/oradata/TESTING/undotbs01.dbf
input datafile file number=00004 name=/u01/app/oracle/oradata/TESTING/users01.dbf
channel ORA_DISK_1: starting piece 1 at 19-MAR-11
channel ORA_DISK_1: finished piece 1 at 19-MAR-11
piece handle=/u01/app/oracle/product/11.2.0/dbhome/dbs/07m7j6h6_1_1 tag=TAG20110319T043341 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 01:05:26
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
including current SPFILE in backup set
channel ORA_DISK_1: starting piece 1 at 19-MAR-11
channel ORA_DISK_1: finished piece 1 at 19-MAR-11
piece handle=/u01/app/oracle/product/11.2.0/dbhome/dbs/08m7jabs_1_1 tag=TAG20110319T043341 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 19-MAR-11

Now I have a good backup. Will continue to do this after anything major changes.

Still need to figure out what is happening, Gary's suggestion below to check out the BLOCK related init parameters will be a start.

Monday, March 14, 2011

Fun with SQL - Part 578

by Enrique Aviles [twitter]
The first guest post in awhile. I met Mr. Aviles via twitter (yeah for Twitter!) some time back. We got to meet in real life at our local SOUG meeting with Tanel Põder came to town. He's also graced the front page for awhile wearing his oraclenerd t-shirt (which is now his avatar on Twitter).

Part of my job consists of tuning SQL queries. Opportunities abound and present themselves in many different scenarios. A few weeks ago I was assigned a trouble ticket. I was the lucky winner because, after some investigation by the developers, the culprit was identified as an SQL issue that was causing a stack overflow in Hibernate. For those unfamiliar with it, Hibernate is an Object Relational Mapping library for Java (more info here). The stack overflow occurred during string manipulations probably during the code generation phase. I didn’t have to do much investigation of my own because the developers indicated the offending query was composed of 60 IN clauses, each clause containing 1000 elements. The query resembled something like this:

SELECT COL1, COL2, COL3
FROM
WHERE (UNIQUE_ID IN (1, 2, 3,..., 1000)
  OR (UNIQUE_ID IN (1001, 1002, 1003,..., 2000)
  OR (UNIQUE_ID IN (2001, 2002, 2003,..., 3000)
  ...
  OR (UNIQUE_ID IN ( 59001, 59002, 59003, ..., 60000)

I was happy to see the query generation failed before it reached the database. It is usually better to have the application fail with a glaring error during unit tests rather than allowing a potential failure slip into production. Fortunately we test against a database that’s a slightly older copy of the production database so we execute all our queries using real data volumes, not just against artificial tiny development databases. Had we only tested on a development database, this issue would have made it to production with two possible outcomes:
- The application fails with the Hibernate stack overflow error that triggered the trouble ticket.
- Given a smaller set of unique ids, Hibernate successfully generates the query and sends it to the database.

Since the query never reached the database, it is hard to tell how well it would have performed.

For a handful of IN clauses, it is safe to assume the query would have run in a reasonable amount of time but 60 IN clauses could have possibly taken an unacceptable amount of time. Even if the database is able to process 60 IN clauses with no impact to the user, generating such statements as

UNIQUE_ID IN (1001, 1002,..., 2000)
UNIQUE_ID IN (2001, 2002,..., 3000)
UNIQUE_ID IN (59001, 59002, 59003,..., 60000));

in a query is not a good idea for various reasons. In the absence of bind variables, a lot of CPU will be consumed hard parsing SQL statements since potentially every generated query becomes a brand new query sent to the database even if the number of IN clauses remain static. Using bind variables lessens the load but doesn’t eliminate hard parsing as much as it should because queries with a different number of IN clauses become unique, thus unshareable. Both cases fill the shared pool with unshareable SQL, although the absence of bind variables would fill it faster. This is a waste of CPU cycles and memory. We can significantly reduce hard parsing and eliminate the risk of generating a huge SQL statement by implementing a temporary table.

Instead of selecting and sending to the application server all values that eventually ended in a series of IN clauses, we could simply insert those values in a temporary table and modify the original query to join the temporary table. The following is a sample temporary table:

CREATE GLOBAL TEMPORARY TABLE IDS_TMP 
(
  ID NUMBER
) 
  ON COMMIT PRESERVE ROWS;

The table was loaded with the same INSERT that returned all IDs that were used to build the IN clauses. Once the temporary table contained all desired IDs the application generated the following query:

SELECT COL1, COL2, COL3
FROM TABLE AS TAB
WHERE EXISTS
(SELECT TMP.ID
FROM IDS_TMP AS TMP
WHERE TMP.ID = TAB.ID);

Of course, we could have also generated a query with an INNER JOIN or with a single IN clause but EXISTS worked fine for our case. This is obviously a much shorter SQL text. It remains static regardless of how many IDs are selected since they are hidden in the temporary table. Additionally, there is no need to send all IDs to the application server, spend time building a long character sting, and send them back as part of a SQL query. This translates in one hard parse and many executions so the risk of wasting shared pool and CPU usage is eliminated.

The “magic” of temporary tables makes this a safe solution since data inserted is only available per session. User A cannot select data inserted by user B and vice versa. Temporary data disappears automatically after the session disconnects so there is no need to explicitly DELETE or TRUNCATE the table. At the end of the day, this proved to be a suitable solution.

The fact that IN clauses only support 1000 elements was seen as a limitation. If the database imposes limits chances are defeating them with apparently clever solutions will result in slow performing queries and will increase the risk of wasting system resources. When faced with similar limitations we should ask ourselves if there are other means of achieving the same result that don’t require circumventing what was considered a database shortcoming.

Thursday, March 10, 2011

Follow Friday

I know this is a Twitter tradition and I know it is just a tad early, but here is my version of Follow Friday.

First up, Enrique Aviles. I met Mr. Aviles at Tanel Põder's visit back in January in Tampa. We had corresponded a bit prior via Twitter. For the longest time, I read his Twitter handle as "Evil Les 94," I hope that's not me projecting. Super nice guy and apparently has some fun horror stories he promises to share.

Next up, Don Seiler. Don and I have been verbal sparring partners on Twitter for a couple of years now. We appear to have a similar sense of humor, smart-ass. That might be a company-wide trait at Pythian, I seem to encounter a lot of smart-asses from that particular organization. :)

Finally, Lisa Dobson. My first introduction to Mrs. Dobson was through Tom Kyte's blog a few years ago. Since I can't seem to find the exact post that I think I remember, I link to this one where Mr. Kyte talks about introducing her at her presentation for Newbie DBAs.

Now, I'm not one to spread rumors, so don't mention this to anyone, but rumor has it, she "bitch slapped" Tom Kyte at some past event. This is wholly unsubstantiated, so don't hold me to it.

You may see a trend in my Follow Friday suggestions, you may not. If you do, go here, all proceeds go to katezilla (well, technically, to us, her parents, to offset her costs. Though she is a girl and girls are supposed to cost more...). :p