Monday, September 7, 2009

Data vs. Information

Last week in The Case For Views on the very last line I said
Records in a table typically constitute data. Tables, joined together, in a view, tend to turn that data into information.
That elicited a very, very strong reaction from a good friend and mentor. In the comments he left this
Turn data into information? That doesn't make a whole lot of sense to me-- All data is information. Can you clarify that statement a little?
On the face of it, that's not a very strong reaction. He tends to be a lurker though, rarely leaving comments.

Then there was twitter, where he sent me a few more links on the subject.

I'm pretty sure he was fired up.

Once a week or so, we'll get together over beers and have excellent conversations. Occasionally, I'll try to hold my ground from the database perspective. Last week we had a discussion about whether the database should be making web service calls.

Security aside, I thought it was appropriate given the size and skills of the shop, but he and our other friend staunchly disagreed.

Point is, we have some great conversations. It has never come down to "You are stupid!" or anything like that, it's a conversation with each side presenting their arguments.

Since my friend has like 28 degrees in Engineering, I've learned to give him the benefit of the doubt, so I wanted to study up on it.

I asked the oracle-l mailing list on Friday.

My contention, or what I have heard and read, is that a database stores data, only through the use of SQL or some reporting tool, does that data get turned into information. I don't know where I heard or read that for the first time, but I've probably been saying it for years.

Through my friends response and others on the mailing list, I probably need to rethink that particular statement.

Here are some relevant links provided by my friend and others on the oracle-l mailing list:

Principles of Communication Enginnering, By John M. Wozencraft, Irwin Mark Jacobs

Information Theory and Reliable Communication, By Robert G. Gallager

Nuno Suto, aka Noons suggested Fabian Pascal, which can be read here. He also suggested reading up on Chris Date and Ted Codd as well as

Conceptual Schema and Relational Database Design: A Fact Oriented Approach, By G. M. Nijssen, T. A. Halpin

Have you ever used the phrase, "data into information" or some derivation there of? I'd like to track down where I first came across it if possible. Thoughts on Data vs. Information as separate entities?


Anonymous said...

Chet, I cannot respond officially yet. The links you provided were to authors, not sources. You gotta make a statement and cite the text, not just give a list of 15 papers to read.

Secondly, before I respond (officially) you are responsible for defining what you believe data to be. You just can't say that "It's stored in the database."

Doing so only reminds me of this conversation:

"Brawndo's got electrolytes. And that's what plants crave. They crave electrolytes. Which is what Brawndo has. And that's why plants crave Brawndo. Not water, like from the toilet. "

See? You have to tell me what you think data *is* before I can tell you why you're wrong. :D


Anonymous said...

BTW, before I officially respond, here is another source:

"Data are pieces of information that represent the qualitative or quantitative attributes of a variable or set of variables."

First line in wikipedia entry on data


Anonymous said...

My bad. I found another source I'd like to share before I officially respond.

First line: "Data is information ..."


SydOracle said...

Data is the plural of datum. So it is a collection of facts.

I'd say there is an implication that information is something more than that. Usefulness, meaning or understanding.

We store data and use information. Maybe the act of accessing or processing data makes it information.

I think there's a lot overlap. My understanding of your comment about views is that, by linking tables together in a view you are adding some interpretation to the facts. The same could be said of a DECODE that translates state abbreviations to long names (we only have half a dozen states in Australia, so can do that sort of logic in a decode).

Of course metadata like constraints and even datatypes adds some meaning to data, so in practical terms I don't thinks there's a line to draw between the two.

moleboy said...

I think that there's this idea that, if you look at an entire record, in a well designed database, that is data which is also information.
If you look at a single value of that record though...say the column is QUANTITY and value '15', that is not realy information.
What does it mean?
Data becomes information when given a context.

I'm not sure what the pure theory is, but in every database I've ever worked on, you definitely transform data into information.

David D. said...

When I think of data, I think of the actual bits that are stored/transmitted, vs information, which is the meaning of that data. For example, a zipped text file has fewer bits of data than its uncompressed version, but contains the exact same amount of information. You can measure bits of information as the base 2 logarithm of the inverse of its probability. In other words, if I tell you something that has a 50% chance of occuring, I have conveyed one bit of information (and would probably use one bit of data to do so).

As a contrived example, lets say we have a field that contains 1 bit, and we interpret this field as "if it is set to 1, the lottery numbers for this day were 45, 34, 2, etc". If it is set to 0, the lottery numbers for the day were something else. For those rare days that it IS set to 1, this field contains over 27 bits of information (assuming any particular sequence is a 200 million to 1 shot). When it is set to 0, it is obviously conveying almost NO information at all (in this case, about .000000007 bits)