Thursday, June 26, 2008

Design - The Entity

I love designing databases. Specifically, data modeling. I love trying to figure out how the data fits together. Is this an attribute of that? Is it shared with something else? What's the relationship? One to many? One to one? Many to many? Hierarchical?

Let's start with people and addresses. A fairly easy mapping. One person can have one or more addresses, or one to many. The basics of that are drawn out below:


Person to Addresses



The primary key (whether natural or surrogate, defined here as a surrogate) for PEOPLE is stored as a Foreign Key in the ADDRESSES table. To guarantee that a PEOPLE key is supplied for each address, you should add a NOT NULL constraint along with the FK constraint.

Not too difficult.

What if down the road you need to add organizations? Do you shove them into the people table or do you create a new table for them? My initial thought is to create a new table called ORGANIZATIONS.


People Address and Organizations



Easy enough right?

Now I want to track addresses for the ORGANIZATION table. Hmmm...how to do that? My first thought was to just create another address table.


People, Address, Organization, Address



Hmmm...now I have addresses in two different tables. I've introduced the chance of having non-standardized addresses (two entry points, or maybe I don't have the same attributes in both tables). Writing a query to get all the addresses (for a mailing perhaps) will be a bit more difficult.

How about just adding another column to the original ADDRESS table?


People, Organization, Address



That works...sort of. I must remove the NOT NULL constraint for PEOPLE and I definitely can't add one for ORGANIZATION. But they're in one table and I can have one entry point (it would have to be conditional though). What if in the future I add another table? Do I just add another column (FKd) to ADDRESS.

It starts to become a bit unwieldy.

So, enter the ENTITY.


The entity

The ENTITY just becomes a placeholder, providing keys for PEOPLE (one to one) and ORGANIZATIONS (one to one). ENTITYID is stored in the ADDRESS table and voila! Perhaps this is called a supertype (not real sure on my terminology here). You now have one entry to the ADDRESS table (well...hopefully). To pull all addresses is easy.

You can use the surrogate key from ENTITY as your primary key in PEOPLE and ORGANIZATIONS, so if you want to bypass the ENTITY table, you can do so easily.

That's my solution anyway. How about you? Same, different? Am I crazy (I'm starting to think so...)?

10 comments:

Nigel said...

Chet

Not so fast! Can't you have, er, 2 people at the same address? Can't a person give the same address as an organization (my company is registered at my home, for example).

A person - and certainly an organization - can have multiple addresses (though maybe just one currently preferred, primary address).

Keep smiling!

Regards Nigel

oraclenerd said...

Preferred or primary would be an attribute of the address.

Yes, you can have 2 people at the same address. But should 2 people be allowed to edit that address? I say no.

dmcghan said...

Chet,

lol, this sounds familiar. I remember battling this one out for a while. I agree, however, that even if the same address is used more than once, it should be a separate record. You're storing someone's address which can change, not the address alone.

I moved away from this model altogether. If it's 1 to 1, I would just add a few columns to the persons table to store their address. It alleviates the need to add separate db objects to enforce a more normalized approach. It can also improve performance down the road as you don't have to join the tables.

If it's 1 to many I would create a PEOPLE_ADDRESSES table and only store addresses related to the PEOPLE table.

Regards,
Dan

ak said...

In such situations I use to introduce a new column in the addresses table, e.g. x_id_type which logically refers to people or orgs. The FK column in addresses in that case would be just x_id.

Finally to get the according address you have to join over both columns (x_id_type, x_id)

oraclenerd said...

Dan,

I just want to be able to capture addresses for multiple entities, a people_address table wouldn't work...well, it would, but I'd have to have org_addresses as well.

I'd rather the addresses be stored in one location.

oraclenerd said...

@ak

I've seen that one as well, I just don't prefer it.

I'm not even sure if it's the best way, but the entity seems logically correct to me...

The Fiesty (ex)CoWorker said...

Default rows in your Person and Organization tables solves your NOT NULL constraint issue and allows you bypass the Entity table altogether.

You're familiar with them...-99 padded rows.

Additionally, presuming the table data may be used in some time of drop down styled application interface, a default row makes it easier to code around, too.

The Fiesty (ex)CoWorker said...

The original design had the address as the primary piece of data and the person and organization data were attributes of it.

The revised design, using the entity cross reference table, takes the focus off the address and introduces a new higher order to the data.

In a past life, I dealt with the same thing. That design also introduced an entity table and had addresses, persons, and business attached to it.

Here's a monkey wrench for you Chet; a person can be an attribute of the organization, right? How does that factor into your model?

oraclenerd said...

Hey Feisty!

True, it would solve it. But I would never (well, maybe not never), use padded rows in an OLTP system. I see the use for them in Data Warehousing, so you don't have to outer-join anything and everything...

oraclenerd said...

@Feisty

So, I post about comments and somehow I missed this one. Sorry about that!

I've had a couple of beers so I'll have to wait a little while to read and respond