ORACLENERD twitter/oraclenerd view chet justice's profile on LinkedIn feed oraclenerd t-shirts Unemployment Clock:

  Learning About Performance
I've found myself at a certain disadvantage lately, specifically in regards to understanding tuning. I don't like being at a disadvantage...I'm competitive that way.

I understand, from a very high level of what needs to be done, but I could not prove my theories. Well, I could, sort of, but it wasn't necessarily a reasoned or logical approach. I just tried all possible permutations. There's a certain time constraint with that method and when you're talking about huge volumes of data (i.e. datawarehouses), time is short.

I've decided it's time to change that, so I'll begin to peck away ever so slowly.

In Usage Tracking, there is a table called S_ETL_TIME_DAY. It's a sister table to S_ETL_DAY which is your everyday time dimension. S_ETL_TIME_DAY breaks down a single day into hours and minutes which means there are 1440 records (24*60*60).

Somewhere I saw the following SQL:
SELECT DISTINCT hours
FROM s_etl_time_day
which just returns the hours in the day (24). I've pondered on whether DISTINCT is a bug, but it seems fairly innocuous here.

What about this though?
SELECT rownum - 1 hours
FROM dual
CONNECT BY LEVEL <= 24;
Which one is faster?

Let's try an explain plan

Query 1
S_NQ_SCHED@TESTING>EXPLAIN PLAN FOR
2 SELECT rownum - 1 hours
3 FROM dual
4 CONNECT BY LEVEL <= 24;

Explained.

Elapsed: 00:00:00.05
S_NQ_SCHED@TESTING>@EXPLAIN

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
Plan hash value: 1731520519

------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 (0)| 00:00:01 |
| 1 | COUNT | | | | |
|* 2 | CONNECT BY WITHOUT FILTERING| | | | |
| 3 | FAST DUAL | | 1 | 2 (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter(LEVEL<=24)

15 rows selected.
Query 2
S_NQ_SCHED@TESTING>EXPLAIN PLAN FOR
2 SELECT DISTINCT hours
3 FROM s_etl_time_day;

Explained.

Elapsed: 00:00:00.04
S_NQ_SCHED@TESTING>@explain

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------
Plan hash value: 878743397

-------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 24 | 72 | 5 (20)| 00:00:01 |
| 1 | HASH UNIQUE | | 24 | 72 | 5 (20)| 00:00:01 |
| 2 | TABLE ACCESS FULL| S_ETL_TIME_DAY | 1440 | 4320 | 4 (0)| 00:00:01 |
-------------------------------------------------------------------------------------

9 rows selected.
Sadly, about the only thing I can usually see in an explain plan is whether an index is being picked up or if there is a nested loop. I'm not going to worry about all of it now...this is just a start. More to get me in the habit.

Next I set up tracing since "reading" the file is so much easier now.
ALTER SESSION SET TRACEFILE_IDENTIFIER = 'HOURS';

ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
Here's the output:
Query 1Query 2
What does it all mean? I am sure you know or understand better than I. For me, I just need to create the habit. I'll read (more) about the specifics of the explain plan and the output from tkprof. If you want to explain, feel free. If you want to do it in your own post, link it up or you can write it here. I'd be happy to host it.

Labels: , ,

 
  OBIEE: Usage Tracking Performance Improvements
Usage Tracking, in OBIEE, allows you to capture some basic statistics on your Presentation layer.

I mused recently about diving into the "internals" of OBIEE, specifically learning how to read the logs (Execution Plans) and other stuff.

Mr. Mark Rittman was kind enough to share his thoughts on the matter in the comment section. I'm reposting here because I believe it is valuable insight.
Regarding the thoughts on OBIEE performance. Here's my thinking on the subject.

When I found out about concepts such as OBIEE "driving tables", level 5 logging and BI Server execution plans, my first reaction was to try and dissect them, try and understand them in the same way that I understand the various types of Oracle joins (hash/nested loop etc), Oracle trace files, 10035 traces etc. At the end of the day the BI Server is a virtual database engine and it makes sense to understand and tune it in the same way that we do with Oracle.

But then, the reality is, if you are spending time understanding level 5 log files, the various federated query optimization techniques, to be honest you'd be better off trying to optimize the underlying data so that the BI Server didn't then have to use these features.

For example - if you're trying to optimize a federated query, you should try and co-locate the data instead, either by copying both data sets to the same physical database, or better still, combine them into a data mart or warehouse. If you are having problems with aggregation and start thinking about the Aggregate Persistence Wizard - well you'd be better off putting the data into Essbase instead, and run OBIEE against that rather than your detail-level database.

And so on. So whilst it's interesting to see how the internals of the BI Server work, how it joins data, how you can optimize cross-database joins etc, in the end this is more from the intellectual angle, as you get far more "bang for your buck" by sorting out the data layer away from OBIEE and then bring a more optimized set of data into the BI Server. "Less is more" is my perspective with OBIEE, and the more that we can do at the (Oracle) database or (Essbase) OLAP server end, and keep the BI Server model simple, is the better in my book.

Just my 0.02
That last paragraph is key. Push the data into the database and keep the OBIEE model simple. Sounds right up my alley!

With Mr. Rittman's comments swirling in my head, I went about my performance tuning a bit differently.

We added some user tables to the Usage Tracking model, which seem to bring the performance to a screeching halt. A halt is defined at about a minute or so for the complete page to render...well, the page renders, I guess I mean the queries to return.

Here's essentially what it looks like:

Usage Tracking Setup

Two databases and three separate schemas.

The very first thing I did was limit the number of rows returned to the BI Server from S_NQ_ACCT. Looking at the data more closely, it was pulling in more than just the reports, it had stats for ValuePrompts and Filters as well. Those weren't needed. I created a view on top of S_NQ_ACCT which returned about a quarter of the records. For perspective, I'm dealing with about 400K records.

Added that to the Physical Layer (File --> Import --> From Database) then updated the sources.

Ran the offending page...no difference.

Looking at the logs, I can see 4 queries being sent to the 2 databases. It appears that OBIEE won't push the join to the database if using separate connection pools. After thinking about it, it made perfect sense, OBIEE can't assume that each connection has the same privileges as the other. (Though I cannot confirm that yet, just seems logical).

Next, I pulled the data from Schema 2 into Schema 1.

Ran the page, no difference.

Checked the logs, and there were still 4 queries sent to the 2 databases. To be fair, I may have run the page prior to the changes being propogated to the BI Server.

OK...let's pull the data from DB 2 and see what happens. Import the table, configure everything.

Ran the page, no difference.

Looking at the logs this time, I saw 3 queries. Better right?

For fun, I decided to put the results of Query 2 and Query 3 into a table. Configured everything...blah blah blah.

Ran the page, it can up almost immediately. I called 2 people and had them run the same report and they saw the same behavior. Win! (Caching is not turned on in this environment).

When I looked at the log, a single query was sent to the database. It was an ugly query, but the joins took place in the database as opposed to the BI Server.

This won't be the final implementation, I'll probably have to back out these changes. But it's a great proof of concept.

Labels: ,

 
  Playing With PARALLEL
I'm trying to learn about PARALLEL and how it affects SELECT statements in a DW/DSS (non OLTP) environment. I can't really say I know what I'm doing or what I'm looking at, but I'm trying.

I had a DBA strongly discourage me from using PARALLELism on reporting tables (i.e. facts or dimensions). I think he even tried to explain it to me...but it was even more over my head at that time. I think it had something to do with all the processes kicked off then multiplied by the number of users running queries on those tables. "It will bring the database to its knees" he use to say. I never fought much on that because 1, he explained it to me 2, I trusted him and 3, he showed me once all the processes spawned during a typical data load operation. (I thought that was really cool by the way, all those neat little processes doing little chunks of work).

He did however let me (us) use them on staging tables. This is typically where our transformations took place. INSERTs and/or partition swapping.

Anyway, I was reading the 10gR2 docs, the Data Warehousing Guide to be exact on Using Parallel Execution. Lots of neat pictures...which is good for me.



Then I decided to see if I could see what the run times on a SELECT COUNT(*) would be given different levels of parallelism. If you want to know if a table or index has PARALLEL enabled, you need to look at the DEGREE column in %_TABLES and %_INDEXES. I spent about 20 minutes looking for PARALLEL in those tables before I remembered that.
CREATE TABLE big_table 
AS
SELECT *
FROM dba_objects;
I know have 74,431 records to play with. It's not a lot, I know, I just want to see what happens. I then created an anonymous block to:
1. loop through and change the PARALLELism starting with 1 up to 8
2. get the start time from DBMS_UTILITY.GET_TIME
3. loop doing 20 SELECT COUNT(*)s on table table
4. get the end time
5. get the total time
6. print it out
7. put it in a spreadsheet to compare the results side by side
DECLARE
l_start NUMBER;
l_end NUMBER;
l_total NUMBER;
l_dummy INTEGER;
BEGIN
FOR j IN 1..8 LOOP
EXECUTE IMMEDIATE 'ALTER TABLE big_table PARALLEL ' || j;
dbms_output.put_line( 'PARALLEL (DEGREE): ' || j );
FOR i IN 1..20 LOOP
l_start := dbms_utility.get_time;

SELECT COUNT(*)
INTO l_dummy
FROM big_table;

l_end := dbms_utility.get_time;

l_total := l_end - l_start;

dbms_output.put_line( 'Run ' || LPAD( i, 2, '0' ) || ': ' || LPAD( l_total, 3, '0' ) );
END LOOP;
END LOOP;
END;
/
And here are my results:



Sorry that it's an image...I can't seem to get decently formatted text into Blogger lately.

For this silly, unscientific test, PARALLEL set to 1 (i.e. NOPARALLEL) was the fastest all the way down. I don't know what it means or if it's even valid, but hey, I'm trying. I figured that if I posted something way above my head all you nice folks out there would happily correct me if I mistated something. I am avoiding any statements, if you couldn't tell.

Labels: , ,

 


Guest Authors

How To

Popular

Previous Posts

Code Projects

Archives
August 2007 / September 2007 / October 2007 / November 2007 / December 2007 / January 2008 / February 2008 / March 2008 / April 2008 / May 2008 / June 2008 / July 2008 / August 2008 / September 2008 / October 2008 / November 2008 / December 2008 / January 2009 / February 2009 / March 2009 / April 2009 / May 2009 / June 2009 / July 2009 / August 2009 / September 2009 / October 2009 / November 2009 / December 2009 / January 2010 / February 2010 / March 2010 /


Aggregated by OraNA