Tuesday, May 19, 2009

Indexing High Transaction Tables

I haven't had many opportunities for the past couple of months to write code, something I sorely miss. Since I changed jobs, I've been doing mostly analysis (aka learning all about the data). I think the last time I wrote any PL/SQL was March...wow.

One thing I have been tasked with doing is re-designing how we process payments. Many of the core tables have been bastardized in that entities are mixed up with attributes...or at least I think that's how you describe it. I'll show you an example and maybe someone can tell me what I am trying to explain.

Let's take a lookup table like payment types. You'd have Check, ACH and Credit Card. I'm sure there are others, I'm just trying to get the point across. Fairly simple right? Well, what has happened is that the business has expanded and in order to handle it, new payment types were created.  Now we have Company 1 Check, Company 1 ACH and Company 1 Credit Card. Those lookup records have now been imbued with a different meaning.  What's the term for that?

Oh, you were interested in the title? I'm getting there.

We store a payments in a "transaction" table with a few different lookup tables (like payment types) hanging off of it. We're not talking about Visa volume or anything, but a modest multiple of 10 thousand transactions a day. We hope to be getting a lot more soon. When considering the design of these types of tables, I have tended towards denormalization. You want to get in and out real quick without the need to lookup (I typically use codes for the lookup keys as opposed to the number, "ACH" or "CREDIT_CARD" for example) or write data to a bunch of different tables. One INSERT and done.

Hold on, I'm almost there.

As I was analyzing the data, I found that many of the columns I was grouping on or trying to filter on were not indexed. I suppose someone could make the case that leaving indexes off of this type of table would be good. My intuition is that they should be indexed, that the "overhead" of the index is nominal, but of course I have no proof one way or another.

I'd like to know what the DBAs out there think. A small aside, I consider myself a "dba" (note the lowercase) in that I have some crossover skills, but they tend to be from the application standpoint. "DBA" are people that keep the database running all the time. The ones that can perform a recovery (I can do backups, but I have yet to perform a recovery). The ones that love to dig into the internals of Oracle to see what's going on.

What do the DBAs think? Index a high volume table or leave them off? Either way, please explain yourself so I might learn something. ;)

18 comments:

HunterX said...

What do the DBAs think? Index a high volume table or leave them off?
I'm not a great DBA... many skills was started from programming and System Administration.
So, It's a difficult thing to analyze data.

If I think ;)
I'll think what does application make high transaction?
&
Many transactions to read or write... ???

With many transactions to read -> Index is good way to improve performance.

With many transactions to write -> No Index is good way to improve performance.
But Need Index ..perhaps using indexes with nologging might help.

Another way else... no idea ;)

John T said...

All depends... On a small lookup table an index won't buy you much.

On a large heavy insert/update table there will be a cost with every DML operation. But I do think people get overly concerned with this cost without facts. You need to test & benchmark for the truth.

jpiwowar said...

Hi Chet,

I won't lay claim to the all-caps DBA designation as you define it, but here are my thoughts. Apologies if they sound like an
excerpt from the Adventures of Captain Obvious; I make no claims to originality. ;-)

Bottom line: Don't index where it won't help. For a high- (and hopefully increasing-) volume OLTP table as you've described, indexes that don't support day-to-day operations of the applications using that table should probably be avoided, even if it looks like the indexes "should" be there. Even if your system can absorb the additional work required for writing to the extra indexes for now, as your transaction volume grows, the burden of performing the
additional writes will grow as well. Your ad-hoc (at least, I presume they're ad-hoc) analysis queries may suffer in the short term from lack of extra indexes, but that's better than degrading performance in your payment processing application.

JohnT makes a good point, though: nothing beats a few rounds of testing to determine what's acceptable, performance-wise. Otherwise, all you're left with is a bunch of empiricists with opinions, and "best practices" that may not apply to your situation.

Man. I stopped by to heckle, you know, just to return the favor and all. But then I saw that you had accumulated a few thoughtful
comments, and I changed my tune. Curse you, peer pressure!

Regards,

John P.

oraclenerd said...

Mr. John P,

Fortunately or unfortunately I have to sit next to John T (aka @serge_a_storms), so he pesters me with his big DBA stuff all the time.

I agree with your take as well. However (or do i?), what if we want to do some basic decisioning based on the data in that transaction table? I don't like storing derived data (except in a data warehouse) mostly because I don't like writing the code to maintain those derived tables (I guess a Materialized View would do, but in an OLTP system?).

The decisioning would need to be done off of the values that "should" be indexed.

I guess the ideal would be a DSS, but we don't (and won't) have that in our sights yet.

oraclenerd said...

@HunterX or should I say @surachart? ;)

You can't keep changing names on me!

Index with NOLOGGING? Hadn't thought of that one. I might have to throw together a test case for that.

oraclenerd said...

Mr. John P,

Oh yeah, you can't come here and heckle me!

Aman.... said...

Hi,

Well I don't know much about Oracle so let's leave the topic of being great alone!

My comment, indexing a high volume table without actually be assured that this would bring benefit not just for a single query but for couple of them, is not going to be a very good idea. I have seen a database where we had to drop many indexes because with their presence , our transactions and redo were getting heavily impacted.

I guess what I said sounds like a generalized ROT. So I would conclude by saying that "it depends". As like always, index is meant to make the life of a query better. If it does that, it can stay, if it doesn't , it should(not saying must) go.

Just a "me too" comment!

Cheers
Aman....

PS: Why your t-shirts are not shipped to India? I wanted to get 2 but found that they are not available for India :-( .

oraclenerd said...

Aman,

Thanks for your input. I think that's the way I'm leaning (testing to prove it out of course), to leave the indexes off. Of course I am really just in the design phase and haven't spoken to the DBA just yet...but I will.

re: T-Shirts
I'm not sure why they won't ship to India. I just signed up with them a short while ago to give them a try.

Drop me a line at chet@oraclenerd.com and I'll see if I can get some to you. I printed up 15 for the COLLABORATE 09 (and they're gone) but I will probably be doing another printing run soon. You just have to promise to send me a picture in front of a local monument for me. ;)

chet

Aman.... said...

Chet,

Thanks for the reply!
I'll see if I can get some to you. I printed up 15 for the COLLABORATE 09 (and they're gone) but I will probably be doing another printing run soon. You just have to promise to send me a picture in front of a local monument for me. ;)That would be just so cool. About the picture in front of local monument, sure you got it ;-) .

jpiwowar said...

>(I guess a Materialized View would
>do, but in an OLTP system?).

Well, it depends. Would you need up-to-the moment data for the decision support side? If not, then an mview might not be a bad compromise. Rather than multiple writes to a handful of indexes, you could trade off a write to the materialized view log, which could be used to fast refresh the mview at a time when the system's quiet.

I'm pre-coffee, so maybe this is a really bad idea. Sounds like a fun test, thought, and could generate a blog post or two for ya. ;-)

Regards,

John P.

ps No heckling? Geez, way to take the joy out of the Internet. Next you'll be saying no LOLcats. :)

Crisatunity said...

A few thoughts:

#1 If the cardinality of of values is low a b-tree index may not help as much as your intuition tells you. Low cardinality is ripe for bitmap indexes, but you can't use them effectively on a table with frequent inserts.

#2 Stop using your intuition. Start testing and proving.

#3 If you've never done a recovery, then you've never done a backup.

oraclenerd said...

@crisatunity

#1
Bitmap indexes are definitely out. The cardinality ranges from low to high on the potential target columns.

#2
I have to think about it first though. I'm not going to jump into testing out the theory without some idea of what I would like to do or what I would expect.

#3
Good point. How would I even know if the backup was successful if I have not had to recover using said backup.

oraclenerd said...

Mr. John P,

I witnessed potential problems with Materialized View logs on a transactional system recently so that would probably be my last resort. Full refresh MVs could be run off hours, but it's still just moving the process to a different place wouldn't it?

If we're shooting for thousands of transactions a second or minute, I could understand the need/desire to look at the really small things (like whether or not to have indexes). I don't think we are there and probably won't be there before too long (years). Worse case scenario though is just drop the index(es) right?

Crisatunity said...

I believe bitmap indexes require Enterprise Edition - which I think you've mentioned before isn't in your mix (if I'm remembering correctly).

Sometimes getting "some idea" about what to expect on any theory involves having a healthy appreciation for testing you've done. All the books and advice in the world are no match for personal experience and actual testing. DBA'ing is unfortunately a profession riddled with individuals that read and follow others advice without ever testing. There's a creepy amount of "faith" that many DBAs rely upon.

Frankly, it doesn't take much effort to evaluate explanation plans before and after creating an index and seeing if your index is utilized.

Tom said...

I agree with Chrisatunity and John,

First, analyze what you are trying to do and why? Depends on what you are trying to improve. Also, if you want to enforce things like constraints, then depending on what you are doing, you will most likely use indexes to enforce it. PK's and Unique's by default, but look at FK's too! Like your idol Tom says (not me, unless you want to say I'm your idol lol) It depends.

Also, in low cardinality, not much help.

Joel Garry said...

Those lookup records have now been imbued with a different meaning. What's the term for that?Intelligence in keys, overloading, or violation of second normal form, depending on the details and context.

Old fashioned formal design methodology would suggest starting with the report outputs desired and working backwards.

If you insist on throwing everything in a denormalized table, you will have to have some process to normalize that so you can use this relational database thingie with fewer consequential issues. As you are discovering as you try to grope or group things or whatever you are doing.

Your payments are generally going to have to be applied through some business rules to multiple sales, right? People are going to want to inquire about that, right? Customers are going to be calling and wanting to know WTF you did with their money, right?

So yeah, have a transaction table, but spit out something relational with indices and stuff so you can run the business.

John T said...

Yeah, I'd definitely rule out bitmap indexes. I saw a DBA try this in a OLTP system - ONCE. Every update locks the table. Yes, their experiment was a disaster. Users were complaining about performance for 4 hours before someone figured out it was the bitmap index.

For this application, you might want to check out clustered tables. See: link.

Its a pain in the ass to setup, and I think you have to structure your insert statements differently to properly utilize the cluster. But it should reduce i/o on DML operations. (the cluster should be limited to the transactional tables)

oraclenerd said...

Joel,

"Intelligence in keys, overloading, or violation of second normal form..."

Thank you for that. I'll have to research further to get my terminology up to date.

The "Old fashioned design methodology" you mention is what I tend to follow. I guess it's because I started out as an end-user. What's the point of building a system if you can't get anything out of it?