Wednesday, March 23, 2011

Fun with Tuning!

It's been an interesting and fun night.

Started, innocently enough, with a tweet (go figure, reason #1042 why Twitter rocks) about my CTAS operation completing.



That resulted in a flurry of activity and some actual learning on my part.

Of course you have the smart-ass (tweet), Matt Topper [@topperge]



I quickly did the math, it was only 4,919 times faster than mine. Though after tonight, I would have a hard time believing anything I say.

My CTAS operation created 102 GB of data in a shade over 7 hours. He did 70 TB per hour. Whatever. Showoff.

I need to back up a little actually. I have been posting these numbers over the last few days. Yesterday, Martin Berger [@martinberx] sees one those tweets (a cry for help?), and follows up via email (he rocks anyway, he sent katezilla a postcard from Vienna last year too).

We've exchanged a few emails, mostly me telling him I have no idea what he's talking about and then me trying to explain what I am talking about. Or something.

Tonight (yesterday?) he asked for an AWR report. I told him I disabled everything via DBCA. He told me I probably didn't, nicely. Then he pointed me to Tim Hall's post on running awrrpt.sql. Following those easy instructions, I ran the report. I guess I didn't turn it off.

So far, Matt's a smart-ass, and Martin rocks.

Then Greg Rahn (@gregrahn) joins.



So I have cruddy disks? (read it again Justice).

"Seems likely that the disk writes are the slow side of the execution. The read side probably faster. Got SQL Monitor report?"

I'm thinking, "...", actually, I wasn't. I was stuck on disk. But I could get him a SQL Dev report.

With that, he gets the SQL ID and tells me to do this:

Oracle 11g: Real-Time SQL Monitoring Using DBMS_SQLTUNE.REPORT_SQL_MONITOR

That's 3 years old. Wow.

BTW, here's the SQL statement I was running:
CREATE TABLE claim
COMPRESS BASIC
NOLOGGING
AS
SELECT /*+ PARALLEL( c, 8 ) */
date_of_service,
date_of_payment,
claim_count,
units,
amount,
...
...
( CASE
WHEN application_year IS NULL THEN '9'
ELSE application_year
END ) application_year,
( CASE
WHEN accounting_code IS NULL THEN '9'
ELSE accounting_code
END ) accounting_code,
( CASE
WHEN claim_form_type IS NULL THEN '9'
ELSE claim_form_type
END ) claim_form_type,
( CASE
WHEN diagnosis_code_1 IS NULL THEN '-999'
ELSE diagnosis_code_1
END ) diagnosis_code_1
...
...
FROM claims c;
The night gets even crazier, old friend, Tom Roach (@tomroachoracle) has now ssh'd into my VM. He's doing all kinds of craziness. He walked me through sar, iostat, and a few other tricks. At least now I know how to use the history command so I can replicate.

Meanwhile, Matt is still heckling me and I get an email from Greg after reviewing the SQL Monitor report.

(I'm paraphrasing here)

"Take a look at your CTAS again, anything jump out at you?"

Me: (To myself): "No"
Me: (staring at the email)

"Perhaps you have a parallel hint on the select but not on the table, like this"
CREATE TABLE claim
COMPRESS BASIC
NOLOGGING
PARALLEL 8
AS
SELECT /*+ PARALLEL( c, 8 ) */
date_of_service,
date_of_payment,
claim_count,
units,
amount,
...
...
Wow. Really? How did he do that? Impressive. Cool!

I admit to Greg that I thought he was talking about hardware in his original tweet. He said something pithy. I bowed.

So that information (kinda important huh?) couple with Mr. Roach's look at what was happening using sar told me something...I remember this, CPU wasn't being fully utilized. I can't remember the exact numbers so let's call it 50%. I told him I was about to rerun (it's about 1:30 AM at this point), he suggested upping the DOP to 16 from 8. Sure. I'll do what I'm told.

I reran the statement with the bug fix corrected and upping the DOP on both the table and the SELECT. As I was putting this together, it finished. 2.5 hours. Waaaay better than 7.5 hours. Tolerable for me since I'm not as cool as Matt (who was only on a 1/4 rack).

I learned stuff...mostly about how little I do know. I'll try to write up more notes in the future so I don't forget everything and so no one who helped me will have wasted their time.

Thanks so much to Tom, Martin and Greg, your help is greatly appreciated.

10 comments:

Sidhu said...

Good one...nicely summarized !

Martin Berger said...

What's about NVL function instead of CASE ... IS NULL ...?
Should save CPU.

Enrique Aviles said...

Interesting and informative. This is the type of thing that gets me going. Exadata is fast but so much can still be done at the software level. Hardware solutions should always be the very last option. Thank you for sharing!

Tom said...

Yeah, I didn't look at your statement (hides in a hole somewhere).

Joel Garry said...

Tell Matt we'll be impressed when he runs Exadata on a laptop.

Tom said...

Im running Exadata on a laptop :)

Not kidding...

Tom said...

Enrique, organizations buy Exadata for many different reasons.

They are do for a hardware refresh and want to see what Exadata gives them that other platforms won't.

They want to consolidate.

And yes, sometimes there just needs to be a specific problem that can't be solved without Exadata. With that said in this case, they don't just buy Exadata. The typically exhaust all their tuning options and are desperate to fix some specific issues around performance/managing.

oraclenerd said...

@martin

I hadn't even gotten that far yet, but good idea.

oraclenerd said...

@tom

be careful Tom...you'll be classified with Matt soon. :p

oraclenerd said...

@enrique

i'm with you, but it seems most people/organizations don't want to spend the time tuning, re-designing, etc...they just want to buy hardware/software.

Google Linkshare + Exadata, from what I can tell, they exhausted every conceivable means...they ended up chucking DB2 and consolidating on a 1/4 rack and got much better...everything.

chet