Monday, May 17, 2010

Thoughts on Exadata

I was exposed to Exadata a few weeks ago and my brain has been churning ever since.

I can't speak about specifics (you know, the first rule of Exadata don't you?) unfortunately. I can speak about how I think it can be applied though (I hope).

I came away from that experience in sheer awe. To the point where I am rethinking the entire database landscape.

You may have read somewhere that, in regards to Exadata, you have to relearn your use of indexes. In other words, you don't need them (well, I'm sure there are cases) other than referential integrity.

What about materialized views? Indexes (non-RI anyway) and materialized views are basically work-arounds for a lack of raw power. I saw Exadata scan 45 million rows (in an Explain Plan) and immediately said, that should be materialized, that you don't, or shouldn't, normally, scan that many records. Then I started to think about it...why? Why materialize it if you can scan it in mere seconds?

Materialize views and indexes both require support to some degree or another. Both take development time in way or another. With Exadata, you might just not need them anymore. That's a good thing.

Taking it out one step further, do you even need a data warehouse any more? I'm not talking about the top 5% of shops out there, I'm talking about the smaller shops that are not processing thousands of transactions a second.

Think about why you build a data warehouse. Data warehouses are designed to make reporting easier by 1, creating a standalone instance that has it's own resources; 2, creating a design (denormalization in some cases) that makes it very easy to get at the data.

I may be crazy, but why create all that extra work if you don't need to? By going the DW route, you now have ETL routines, a separate database and a separate design, all of which need to be maintained. Doubtful the same person will be doing everything so you hire more staff.

(Remember, I'm talking about the 95% here).

The raw power of Exadata would allow you to do everything in a single location.

I've had private conversations with people about this very subject...some think I'm off my rocker (naturally), some haven't been exposed to it and are wary of speculation, and others see some merit in my rambling.

I'm very excited about the possibilities with Exadata. What say you?

Added January 26, 2011
A great post by Jeff McQuigg: OOW BI Imp Panel Questions #3: Can Exadata Replace a DW? - I asked this question at the OOW panel, but I'm not sure if this is directly related as mine was rushed to go see Larry's speech. Good read.

9 comments:

Stewart Bryson said...

Chet:

Dimensional models aren't just about performance... they are also about simplification of the model to make reporting less daunting to end users. Tools like OBIEE can make this easier, using a logical star schema to present the OLTP database as if it was a data warehouse. However, try conforming four or five source systems into a logical star schema in OBIEE and it becomes very, very difficult. You end up with one or more logical table sources for each dimension table and fact table multiplied by the number of source systems (usually). You'll also end up with numerous aliases per source system table to ensure the tables join more like a de-normalized model and less like a transactional system.

There will always be reasons why ETL is necessary. Many source systems don't track history at all... or only store LAST_UPDATE dates without the prior values. How about slowly-changing dimensions? When a customer moves from New York to Atlanta, does it make sense that every order she's placed while living in New York should now be attributed to Georgia?

The finance guys love when fiscal year reports continue to change long after the books are closed.

Stewart

oraclenerd said...

You definitely have me on multiple source systems. I won't argue that one.

What if you could design the OLTP system the way you want/need it? Either using a from and to date for or shadow tables for maintaining history.

I agree, the DW simplifies the model greatly for end users, but couldn't you create a logical model to simplify?


I do hear you...it was a blanket statement that doesn't not cover outliers. If you could start your OLTP today on The Database Machine, how would that affect everything else that you would normally develop?

That might be a better way to phrase it actually. Perhaps a follow-up (in addition to my other follow-ups of course) is necessary...

chet

Noons said...

Keep something in mind when talking about MVs and "no need for them if you can scan 45 million rows in a few seconds":

for every system where we can scan 45Mrows in seconds, I'll show you a demand for MANY such *concurrent* scans.

Dump many of those - concurrent! - in Exadata and fast or not fast, I can guarantee a (costly) performance disaster!

Then it's back to MVs - or whatever the solution might be: just using them as an example here.

Where I feel Exadata does a great job is in allowing us to resync a complex MV in next to no time, whereas before we'd need a few hours/days.

And of course the multitude of other db operations where once-off, very fast scans are needed: odd once-off reports, index builds, big sorts, the odd badly worded query, etcetc.

Let me just insert an aside here: we run a DW without MVs (unfortunately: working on it at the moment!) and now for the first time with partitioning.
We're getting index reduction AND one order of magnitude faster queries just by effectively using partitioning. That feature would be, IMHO, the "poor man's Exadata" at the moment.

Haven't yet got my hands on an Exa-box but the way data volumes are growing at our DC, we'll need one in another two or three years.

Meanwhile I'll partition and MV as much as I can, in preparation. And yes, I'll reduce indexes! :)

oraclenerd said...

@noons

You hit it. DOP and partitioning are keys to Exadata (so I've read anyway ;).

I wouldn't say MVs should go away completely but it's fun sometimes to make a blanket statement like that as you'll most definitely get a response.

I think that one query I saw, if run by multiple users at the same time, would have done some damage. Some of that would be offset by your DOP (if I'm not mistaken) and I'm not sure we had the optimum set up on Exadata at the time (as far as DOP).

I didn't even get to mention all that other stuff you talked about. Everything is faster with Exadata. I wish I could have tried out my former 8 hour ETL process (no delta of course) on Exadata. Would have ended up around 8 minutes or even 8 seconds.

I guess the point I was trying to make was that this machine does offer a paradigm shift in how we think about and design databases (DW in particular, if you even need one) :)

Stewart Bryson said...

With the source systems either running on Exadata, or replicated to an ODS-type environment on Exadata, then I think that gets us closer. In that way, the drill across for fact tables done in OBIEE could be pushed completely down to Exadata, and remove the BI server from the equation. That would scream.

However... doing OLTP with OBIEE can be very difficult. Perhaps there will be features in 11g that make this easier... but at the moment, the multiple alias tables and LTS's required for conforming source systems can be more work, and require more "development" than doing something comparable in ETL.

I've heard Oracle sales from time to time propose what you're talking about here... and I do think for some environments it would make a lot of sense. However, it would be a Herculean task to try and do a real conformed business-wide EDW in this way. You put the disclaimer in the initial post about outliers, but I see companies of all sizes doing EDW's sourced from multiple transactional systems.

If I could design the source system from the ground up to do reporting, then yes... I think this could work very effectively. That would make a very interesting case study.

Great stuff as usual.

Stewart Bryson said...

My experience with Exadata is that parallelism is not near as important as it used to be when SmartScan gets into the picture.

When IO is a bottleneck, the usual approach is to attack the query with lots of threads, decreasing IO waits at the expense of CPU waits, which is usually a reasonable exchange considering the high amount of IO waits a typical DW system experiences.

However, with SmartScan, IO waits drop significantly, and parallelism can actually add unneeded overhead. It was quite a paradigm shift for me to actually decrease parallelism to get better response times... I suddenly felt out of place.

oraclenerd said...

Stewart

re: SmartScan and Parallelism

It was fun watching the DBA find the sweet spot for DOP on Exadata. Your observation holds true from what I noticed (as a passive participant). For certain queries, higher was better and others lower was better. I didn't quite understand the reasoning (I'm not a real DBA, I just play one on TV) behind it.

Bradd Piontek said...

meh

oraclenerd said...

@bradd

"meh"?

WTF?

I know you got more words than that.