Tuesday, November 24, 2009

Playing With PARALLEL

I'm trying to learn about PARALLEL and how it affects SELECT statements in a DW/DSS (non OLTP) environment. I can't really say I know what I'm doing or what I'm looking at, but I'm trying.

I had a DBA strongly discourage me from using PARALLELism on reporting tables (i.e. facts or dimensions). I think he even tried to explain it to me...but it was even more over my head at that time. I think it had something to do with all the processes kicked off then multiplied by the number of users running queries on those tables. "It will bring the database to its knees" he use to say. I never fought much on that because 1, he explained it to me 2, I trusted him and 3, he showed me once all the processes spawned during a typical data load operation. (I thought that was really cool by the way, all those neat little processes doing little chunks of work).

He did however let me (us) use them on staging tables. This is typically where our transformations took place. INSERTs and/or partition swapping.

Anyway, I was reading the 10gR2 docs, the Data Warehousing Guide to be exact on Using Parallel Execution. Lots of neat pictures...which is good for me.

Then I decided to see if I could see what the run times on a SELECT COUNT(*) would be given different levels of parallelism. If you want to know if a table or index has PARALLEL enabled, you need to look at the DEGREE column in %_TABLES and %_INDEXES. I spent about 20 minutes looking for PARALLEL in those tables before I remembered that.
CREATE TABLE big_table 
FROM dba_objects;
I know have 74,431 records to play with. It's not a lot, I know, I just want to see what happens. I then created an anonymous block to:
1. loop through and change the PARALLELism starting with 1 up to 8
2. get the start time from DBMS_UTILITY.GET_TIME
3. loop doing 20 SELECT COUNT(*)s on table table
4. get the end time
5. get the total time
6. print it out
7. put it in a spreadsheet to compare the results side by side
l_start NUMBER;
l_end NUMBER;
l_total NUMBER;
l_dummy INTEGER;
FOR j IN 1..8 LOOP
dbms_output.put_line( 'PARALLEL (DEGREE): ' || j );
FOR i IN 1..20 LOOP
l_start := dbms_utility.get_time;

INTO l_dummy
FROM big_table;

l_end := dbms_utility.get_time;

l_total := l_end - l_start;

dbms_output.put_line( 'Run ' || LPAD( i, 2, '0' ) || ': ' || LPAD( l_total, 3, '0' ) );
And here are my results:

Sorry that it's an image...I can't seem to get decently formatted text into Blogger lately.

For this silly, unscientific test, PARALLEL set to 1 (i.e. NOPARALLEL) was the fastest all the way down. I don't know what it means or if it's even valid, but hey, I'm trying. I figured that if I posted something way above my head all you nice folks out there would happily correct me if I mistated something. I am avoiding any statements, if you couldn't tell.


jpiwowar said...

How many CPUs/cores on your test box? If it's only one, it's not too surprising that parallelism 1 would be fastest. All of those individual chunks of work would all be competing for the same CPU, and the overhead involved in managing contention for that resource would bog things down a fair bit.

From the doc you linked:
"Parallel execution benefits systems with all of the following characteristics:

* Symmetric multiprocessors (SMPs), clusters, or massively parallel systems
* Sufficient I/O bandwidth
* Underutilized or intermittently used CPUs (for example, systems where CPU usage is typically less than 30%)
* Sufficient memory to support additional memory-intensive processes, such as sorts, hashing, and I/O buffers

If your system lacks any of these characteristics, parallel execution might not significantly improve performance. In fact, parallel execution may reduce system performance on overutilized systems or systems with small I/O bandwidth."

Bottlenecks are fun. I once had a user start a concurrent program on EBS system with parallelism 32, assuming that bigger numbers meant faster. Two things worked against us there:

1) The system in question had two CPUs.
2) The user also had turned on verbose diagnostic logging, so all 32 parallel processes were trying to write to the same log file.

Things were a bit pokey on that server for a while. At least 75% of the execution time for the process was spent in calls to the PL/SQL routine to open the log file for writing. ;-)

Greg Rahn said...

I think your table is too small. Parallel execution is useful for big problems and much less useful for small problems. Move to a table that is 74 million rows (or at least one that is significantly larger). Also, doing a simple COUNT(*) will get you IO bound very fast as it is very light on the CPU cycles compared to joins, aggregations, etc. Also if your storage is not capable of delivering the IO requests, it simply wont go faster the higher DOP you use. Simply put, you will need some enterprise (or at least sever class) hardware for any tests to be very meaningful.

I'd add this to your reading list: