ORACLENERD: mcohen

Showing posts with label mcohen. Show all posts

Thursday, April 1, 2010

March Madness

No, it's not about basketball. I can't seem to get into this year, despite all the upsets (from what I hear).

This month, oraclenerd passed the 10,000 visitor mark...that's just amazing to me. Thank you.

Almost hit 16,000 page views as well. 10 shy if you're curious. Again, awesome.

Like last month, I'm posting the top 10 list based on views for March.

Page	Views
Learning OBIEE	763
EBS Install Guide - Part 1 by John Piwowar	573
DBMS_CRYPTO: Example	381
The Case for the Bit Bucket by Michael Cohen	352
APEX: Install 3.2.1 on 11gR2	331
EBS Install Guide - Part 2	330
OBIEE: How to Migrate Your RPD	238
ORA-08177 can't serialize access for this transaction	236
OBIEE: Install on OEL 5.4	225
BULK COLLECT and FORALL	222
EBS Install Guide - Part 3	210

On a related note, John cracked the top 10, all time, in pages viewed on oraclenerd. His post has been up for just under 4 months and I've been at this for almost 31 months. Maybe if I had John post more often I could actually make (more) money here. As it stands, I owe him about $0.53 USD.

If you want to take a crack at John, I'm still accepting guest posts. Had 2 this month, both on the database as a bucket meme. Michael O'Neill had extended comments in Everything is a Bit Bucket and Michael Cohen took the other side...as usual.

Tuesday, March 16, 2010

The Case for the Bit Bucket

By Michael Cohen

Mr. Cohen is most famous here for the discussions we've had before about Application Developers vs. Database Developers. Part I is here, Part II is here. Mr. Cohen is a friend of mine. I have a great deal of respect for him. We obviously disagree in some areas, but I've learned to appreciate his push-back and learn from it.

He had left a comment the yesterday on my most recent post, The "Database is a Bit Bucket" Mentality (Michael O'Neill posted his followup, Everything is a Bit Bucket, as well). I thought the comment would get short shrift, so I asked him to write up a full post. I deleted that comment and here is his post.

Modern RDBMS's are quite powerful today. Pretty much every one of them has full support for SQL, including vendor extensions, all of the features we've come to expect from a relational database, a full fledged programming language built in, and quite often support for extras like full text search or native handling of XML. Most now also now ship with highly feature specific add-ons - PostgreSQL has a geospatial package that makes it the defacto standard in that domain, MySql has hot replication in a master-slave paradigm, Oracle has....well, Oracle has all kinds of things, a full object system and Java inside, a message broker, an HTTP server, a complete UI toolkit, among other things.

So the question arises as to how much of this capability one should use. I think it's becoming apparent that the answer to this is, "not much." Why shouldn't you take advantage of as much of the database's feature set as possible? The answer is performance and scalability. But wait, aren't stored procedures faster than ad hoc queries? Yes (theoretically). Won't it be more performant to execute business logic as close as possible to the data it operates on? Why should we introduce yet another component into the architecture when the database is perfectly capable of handling a particular task?

For one thing, the programming languages and environments offered by relational databases are now relatively long in the tooth, and have been eclipsed by modern OO languages. Developers are much more productive building applications with these new languages, and find it painful and tedious to work within the relational model, with SQL. You can see proof of this now with the overwhelming popularity of ORM frameworks in most of the popular OO languages out there. Java has Hibernate/EJB/JPA and many others. Ruby has ActiveRecord, DataMapper, and Sequel. Python has SqlAlchemy and Djanjo's ORM. And it's not because these developers lack the skills to work with the database directly. Quite the contrary actually, it takes intimate knowledge of the database to work effectively with an ORM. What's more, the ORM is often able to make runtime optimizations that would be difficult or prohibitively time consuming to hand code. Finally, clustered caches offer massive performance and scalability improvements, handling writes back to the database transparently behind the scenes, but for the most part they preclude implementing complex business logic in the database.

The overall trend is clear, across languages and platforms. It's the movement of data out of the database and into the application layer. Less and less reliance on the database, perhaps only for archival purposes. John Davies has a good comment on this. He's operating in a unique environment with extremely rigorous performance requirements, but we're now starting to see similar constraints imposed by the web. There's a whole class of software that has come about due to the inability to scale the relational database beyond a certain point. Facebook developed Cassandra, now used by Twitter, Reddit, and Digg, among others. LinkedIn built Voldemort. My employer doesn't deal with the massive scale of these companies, but we do large scale data processing with Hadoop. HBase, another non-relational persistent data store, is a natural fit, and just about the only option really. We use MySql less and less.

Of course, not everybody is building applications with such high scalability requirements. But even for applications with less intensive scalability requirements I would argue the same tendency to minimize the workload on the database should apply. Cameron Purdy has a good quote, "If you don't pick your bottlenecks, they'll pick you." Design your application to bottleneck, he says. What he means is, your application is going to bottleneck on something, so you need to explicitly decide what it will bottleneck on. Unfortunately, most applications bottleneck on the database, as this is the hardest layer to scale. It's pretty easy to scale the front end, we just throw more instances of Apache out there. It's a little bit harder, but not much, to scale the app server. But it's pretty hard to scale the database tier, particularly for write intensive applications. For well funded organizations, Oracle RAC is the standard. MySql's master-slave setup and hot replication saw it win out over PostgreSQL despite the fact that Postgres is a much better database in just about every other respect. The NoSql projects listed above grew out of the inability even to scale out MySql.

The trend is clear. We're collecting and processing more data than ever before, and this will only increase as we go forward. Unfortunately, the relational database (at least in it's current form) isn't well suited to the scale of data processing an already significant and growing number of organizations deal with on a daily basis. We're now seeing new solutions come forth to address the shortcomings of the traditional RDBMS, and the same forces that have necessitated such developments are at work even in smaller organizations. At all levels, developers would do well to require as little functionality as possible from the database, essentially, to treat it as a bit bucket.

Tuesday, July 7, 2009

Classic: Application Developers vs. Database Developers II

The original (with all the fun comments) can be found here. Originally posted on December 9, 2008. This is the "followup" to yesterday's post.

You can read the first article here. My application developer friend, Mr. M, emailed me and another fine gentleman this little blurb recently:

Mr. M:

OH YEAH BABY!!! TEN TIMES FASTER!!!! YEAH!!!!!!!!

Hey seriously, what a tub of shit Oracle is. Where does this myth come from that it's such a great platform? Their client tools suck balls and it's generally just a pain in the ass to work with from a developer's point of view. But devs for some reason are under this impression that from thew server perspective it's rock solid and performant. Well, it may be solid, but it's a fucking turd. Our dba here - definitely an Oracle guy - has been tasked with looking into moving us onto an oss db. He basically refuses to even look at MySQL, stating that it's a mickey mouse worthless pile of shit (can't really argue with him there lol), so that basically leaves Postgres. So it's been a few weeks now, and he will basically admit now that Postgres completely waxes Oracle as far as performance goes. LOL We run RAC in production too. He's looking at failover, replication, blah blah blah now, we'll see what the verdict on that is. Oh, and Oracle AQ? That's a worthless pile of shit too. Why do they tack a fucking message queue that doesn't play nice with jms onto the fucking database, which is probably already overworked? Oh wait, that's right, they're in the business of selling per cpu licenses! Cocksuckers.

This was prompted by a recent Oracle email blast about the Exadata storage system/Warehouse.

As I did before, I'll just put the email here.

Me:

Agreed, their client tools aren't all the great. Which ones are you using?

I use SQL*Plus (naturally), SQL Developer and JDeveloper. The latter 2 tie in very nicely with Subversion. With JDeveloper, when I want to try out Java, it's got a pretty good suite of tools.

Oracle starting out catering to businesses, Microsoft started with the consumer. Oracle has made pretty good strides in making their software more usable while Microsoft has made theirs more scalable.

I haven't used AQ a whole lot and definitely not with Java. I do know that it's billed as JMS compliant.

Postgres has it's place and so does Oracle. It is a great platform if you know how to leverage it. PL/SQL is a 3GL (or is it 4) and you can write entire applications using it...and if you do that right, an incredibly robust and scalable application at that.

Mr. M:

"It is a great platform if you know how to leverage it. PL/SQL is a 3GL (or is it 4) and you can write entire applications using it...and if you do that right, an incredibly robust and scalable application at that."

NO!!! NO!!! NOOOOO!!!

I want to beat people like you who say this with a ball pean hammer. There are only a select few individuals on this earth who can write and read application logic written in SQL. AVOID THIS ANTI-PATTERN AT ALL COSTS! What is it with you f_cking database guys??? Why do you always want to stuff as much crap into the db as possible?

DUDE! SQL IS A PILE OF SHIT FOR EXPRESSING APPLICATION LOGIC!!!

It's fine for querying and manipulating sets of data, in a relational database. But it is a worthless sack of shit for expressing application logic!

I'm having to dig through this f_cking abortion right now because some Oracle f_ckhead thought "you can write entire applications using it!" Blog that, mofo!

This was followed by a package he had been working on. I wouldn't say it was the greatest, but it wasn't all bad either.

Me:

goodness gracious.

"DUDE! SQL IS A PILE OF SHIT FOR EXPRESSING APPLICATION LOGIC!!!"

disagree (naturally). It's incredibly easy to do, you just don't know how yet...and it seems even the Oracle professionals out there don't either.

I'll tell you this, the crazier the SQL or PL/SQL needed to derive and manipulate data the poorer the design. Start with a good design and it all becomes very simple.

Of course note the use of "naturally" in my lexicon. Thanks Jake.

Mr. M:

well dude, we are back to our old discussion - you arguing that procedural sql code is perfectly fine for building apps, and by extension, that the last 20 years of computer science have basically been a misguided lost journey down the meandering, fruitless trail of oop. um.....no. select this from that. otherwise keep that sql crap caged up where it belongs.

btw, do the db guys here suck? seriously. i'm not competent enough to judge. (to be fair, apparently that crap i sent you is fairly old stuff, but still....)

Me:

I would say, based on limited knowledge of software development, that the OOP movement was started because the database (specifically Oracle) was not mature enough to do what was needed. Plus, I seem to recall that the OOP movement was supposed to have solved all the world's problems by now.

It's further propogated due to the needs you specified in our discussion that day at WellCare (i.e. performance). I still believe that if application developers better understood how a DB works (Oracle, MySQL, SQLServer, etc) they would end up writing less code. Database constraints alone force you to write less (and better) code simultaneously ensuring good data.

The code that I did look at (first 1000 lines or so) isn't great.
1. With all those IF THEN ELSE statements it's telling me that there's probably a better way to store the data. Looks like they're missing an attribute that should be contained with a track.
2. using Object (PL/SQL) types to store data in the way they seem to be doing it is not the best way. Again, probably a design issue.
3. When you do something like this:
   UPDATE pb_album_metadata
      SET primary_digital_flag = 0
    WHERE album_id IN (SELECT b.album_id
                       FROM (SELECT a.album_id AS album_id,
                                MAX(a.album_id) OVER (PARTITION BY a.standard_upc) AS latest_album_id
                             FROM pb_album_metadata a
                             WHERE a.standard_upc = g_album_tab(1).standard_upc ) b
                       WHERE b.album_id <> b.latest_album_id )
         AND primary_digital_flag <> 0;
They should probably have considered end_date as an attribute of the album metadata. While analytic functions are pretty cool, they're more for analyzing (OLAP) and not OLTP environments.

That's for starters and without table definitions...

Me (again):

oh yeah...and PL/SQL is/was built on top of ADA, FYI.

Mr. M:

"I still believe that if application developers better understood how a DB works (Oracle, MySQL, SQLServer, etc) they would end up writing less code. Database constraints alone force you to write less (and better) code simultaneously ensuring good data."

Huh? What are we not understanding? What would be an example of a constraint that would force us to write less and better code?

Me:

CHECK, NOT NULL (same as CHECK) and FOREIGN KEY constraints all fit into that category.

Instead of having to check if a value is NULL when submitting it to the database, just submit and let the database throw an error, which you'll conveniently capture in your Exception block. Same goes for CHECK constraints, columnA should be either one of three values (test, test1, test3), you won't have to check that in your application code, just catch the exception. FKs insure good data (proper relationships).

A different perspective. If you are going to pay for it, use that damn thing. If you don't care, don't do it. But over the course of an "enterprisey" application, you'll end up writing more code and make maintenance that much more difficult (did I do it there?). Just those constraints will force you and the teams of application developers to put good data into the database.

You can still do it in the application of course (form validation is a good place)...

Mr. M:

Ahh, jeez dude, I wasn't sure if you were referring to the literal "constraint" or not.

Dude, even f_cksticks like ~~redacted~~ I think have a decent enough understanding of when and how to use db constraints. It's when you get into things like cursors or cost plans of subselects and anonymous tables (i think that's the name for it - where I say select something from (select some other crap). Then we defer to db gurus like yourself.

But dude....."you won't have to check that in your application code, just catch the exception".......uh, don't ever repeat that off our little email group. And definitely don't go posting that on your blog. F_ck me man, it's a damn good thing we keep you db folks caged up in that rdbms box....

Me:

So we've had this discussion at work...for a high transaction system, do Java/C/etc handle exceptions well or what?

Why is it bad to deal with exceptions rather than coding to avoid them?

I highly doubt even ~~redacted~~ understood database constraints...him and his cohorts believed all database software would be commoditized and MySQL would be king in short order.

Mr. M:

"for a high transaction system"

Or for any system really....

To use your example of check constraints (is this value one of....) or not null constraints, checking these rules in the Java code and in the database code would seem to violate DRY. But we do that alot, and it is acceptable in certain cases. For instance, we also probably violate DRY if we're checking this same rule say in Javascript on the front end. But we accept this tiny violation of DRY because it tends to make for a better user experience and as a performance gain too, for we avoid the round trip to the server. Now, what your advocating here is close to the same thing. You're basically saying, don't check for the not null constraint in Java code, just go ahead and hit the database, let the database throw back an exception to the Java code, presumably correct the problem, and then make another roundtrip to the database again. Dude, what are you thinking?!? This to say nothing of the fact that this also could be considered a violation of Fail Fast, and a violation of Defensive Programming - what happens if the dba forgot to add the not null constraint in production?

Dude, listen to this guy. For a "high transaction system" basically you want to treat the database, as much as you can, as just a dumb data holder. A f_cking dumpster, that you just throw sh_t into and pull shit back out, using no vendor-specific features if at all possible.

Again, for we've had this discussion, but even in everyday apps, not just on Wall Street, the database is the bottleneck. And the database is the hardest layer to scale. So given those facts, you should design your app to do as little work in the database as possible.

I was laughing at this point because the link above points to one of our consulting architects (I'm not really sure what his role is at this point).
Me:

i agree in any application that you want to minimize the number of round trips...

shocker...he's one of our architects. he's spot on in many instances, but...

database is the bottleneck because people don't know how to write SQL. I'll certainly concede the wall street applications (for the time being anyway), but the rest of us with what we do on a daily basis...Oracle will scale far beyond the demands they place. When that bottleneck shows up, 9 times out of 10 some dumb-ass c#/java guy thought he could write a better query than i. besides, what's the idiot doing anything but a simple join anyway? probably poor data model to start with...and we're right back where we started (sort of).

Mr. M:

"database is the bottleneck because people don't know how to write SQL.....some dumb-ass c#/java guy thought he could write a better query than i."

Dude, I'll grant you, people don't know how to write SQL, myself included. But that's not always why the database is the bottleneck. I think it's safe to say that's not even the majority of the reason. Yes, there are apps written by people who were just idiots, and they needlessly pummel the database, but that's definitely not the majority of scenarios. At my work the database is the bottleneck, and we run RAC in production. It's my understanding that even with RAC, there is a limit to how much you can scale that out. But any layer up from the database we are basically unlimited in how much we can scale that out. So it's very easy to stick another Apache box in front, or fire up another Weblogic server. But I can't do that with the database. We have 24 Weblogic servers for ~~redacted~~. The database is the bottleneck. And we don't have shitty sql code in the app. In fact, we have very few hand-written queries anywhere in the app. Persisting something to a database is really a low-level concern that as an application developer I shouldn't even have to be bothered with, except for the rare corner cases where the persistence abstraction I'm using is just a little bit too generic to handle things effectively. And we don't use these ORMs because we don't know how to write sql. To effectively use an ORM tool requires a deeper understanding of sql and databases than simply being able to write solid SQL code. (Not saying Java devs who use ORMs know SQL better than a dba, just that it requires better sql skills than is required of a Java dev to simply write JDBC/SQL.) Now, before you try to tell me that my ORM library isn't able to write as efficient of sql code as a dba of your caliber, keep in mind that ORM tools are pretty advanced. They're able to intelligently do things like batch sql updates, and let you apply transactional semantics much easier than with raw jdbc/sql. But the overwhelming reason developers have so thoroughly adopted ORM is because Structured Query Language is such a nasty piece of shit for expressing application logic. SQL is a declarative, procedural language. It's totally unsuited for writing application logic! This, more than anything else, is why organizations and dev teams should seek to restrict what is handled within a relational database as much as possible - because the programming interface to it is a fucking ancient backward dying dinosaur.

Mr. V (note, not Mr. M):

My 2 canadian cents:
The polyglot approach "... use different languages for different domain ..."
Database was developed to manipulate data and should remain there.
General purpose language was developed to encapsulate logic and should remain in that domain.
You should not use DB to encapsulate business logic (in my opinion) no more than you would use HTML to create complex logic.

While Java, C#, Python, etc are described as General Purpose languages, they, too, are really domain-constrained. Their expressiveness are confined (and should be) to express low-level, tersed, explicit, verbose, and repetive logic (if that makes any sense). Languages such as these are more suitable for low-level abstraction on top of which, richer, more expressive languages can be built. We are seeing this now with the emergence of languages on the JVM (and arguably on the .Net's CLR).

I think SQL as we know will take a back seat and a smart developer somewhere will create a new domain-specific language that lives on the VM and will push the SQL expression out of the RDBMS and closer to the code that needs it. We are not there yet, but Microsfot is trying (see LINQ and all ORM papers). This is not to say that there won't be isntances where tuning in the SQL-native language won't be necessary. However, 80 to 90% of simple CRUD cases will be handled closer to the logic code that uses the data code.

Again, that's my 2 canadian cents... I could go on. But I have a meeting with ~~redacted~~.

Mr. V, I believe, is a little bit more sensible. Mr. M on the other hand is just trying to rile (sp?) me up.

Me:

Someone will probably create something like that, but it still gets at the heart of one of my arguments, many developers don't know how to use a database thus will go to any means to circumvent it. Embrace it I say.

Ultimately for me, it comes down to simplicity. I can write an application using PL/SQL that will scale very well for 90% of the solutions. Of course that doesn't include the "fancy" javascript/Ajax stuff needed for the UI. In my ever so humble opinion, this is a much better solution for a business in the long run.
1. You're more likely to change the middle tier than the database tier. Java, asp, Ruby, what's next?
2. Fewer moving parts means lower cost of ownership, even with the CPU costs. One person, one, was able to build and maintain a 350 page website. Be hardpressed to do that with the more expressive languages.

I think all of us are purists and very passionate about what we do. It's probably the main reason we get along. I thoroughly enjoy these conversations because it does force me to think...and that's always a good thing.

Mr. V:

Haaa chet.
You sorta made my point than diverged away. Building an app in all PL/SQL is dangerous. It's no safer than me building an app in all Java. I can build very limited app in all Java. The moment I need to interact with other domain (UI, data, low-level native code, etc), I have to switch to something that is closer to what I am trying to do. If I need to create UI, I will pick a ui-centric environment, when I need to talk to DB, I will pass connect to a db and send in my SQL, and so forth. I will use Java as the glue to bring it all togher.

In the end, we may be saying the same thing, but using different accent. O well.

And that's where it ended. I must say it's always fun. Mr. M and Mr. V are both very smart individuals and I highly respect what they do. We have different perspectives...but I think they listen, if only a little, as I listen to them. Their voices creep up on me especially now...which is definitely a good thing.

Monday, July 6, 2009

Classic: Application Developers vs. Database Developers

One of my favorite "articles" of all time. I love these types of conversations, in the DB or the Middle Tier? For the vast majority of us, the Database will do just fine. As I've learned more about the data grids and the like, there are trade-offs, which aren't often discussed. One way or another you lose data (say you decide to only UPDATE one time instead of 60). Originally posted on February 20, 2008. Enjoy.

It started innocently enough with this article. I sent it out to about 20 colleagues.

The best line from the article:
"Jerry: "Yeah, databases cause lots of headaches. They crash all the time, corrupt data, etc. Using text files is better."

One of my more recently arrived colleagues (I'll call him Mr. M) replied to everyone with this statement:

"Kind of funny actually, databases are less and less important at the large investment banks, where they basically load everything up into a data grid across a several hundred node cluster. Writing to the db is way too slow."

This started a day long exchange of emails. What follows is the entire thread (up until my last post tonight).

Me:
"I would just argue that they don’t necessarily know how to write to databases. I would however love to see benchmarking done on both methods. Would be an interesting test..."

Mr. M:
"Well, my understanding is they just can’t scale out the db enough. Even something like Oracle RAC won’t work. And outside of the military, these are probably the top 1% of programmers in the world building this stuff."

Me:
"A benchmark would be the only way I would believe it.

If you said the top 1% of database developers tried it and failed, I would be more likely to agree.

My experience is that application developers != database developers. Different type of thinking involved."

Mr. M:
"'A benchmark would be the only way I would believe it.'

Do you need a benchmark before you would believe in-memory retrieval is faster than disk retrieval? Essentially, this is what we’re talking about.

'If you said the top 1% of database developers tried it and failed, I would be more likely to agree. My experience is that application developers != database developers. Different type of thinking involved.'

Why? It’s an issue to do with application performance not simply database performance. Database concerns are a subset of application concerns, essentially a specialization, requiring less encompassing knowledge. ;)

From the article you linked to (http://www.watersonline.com/public/showPage.html?page=432587)

"Better data management is the answer, says Lewis Foti, manager of high-performance computing and grid at The Royal Bank of Scotland (RBS) global banking and markets. "For very large compute arrays, the key issue is data starvation and saturation. This problem requires data grids with high bandwidth and scalable, parallel access,
...
Banks are learning that data management in a distributed grid environment is very different from online transaction processing. "With so many data sources, distribution channels, demands for aggregation and analytics, surges in data volumes and complex dynamics between the flows, we need to manage 'data in motion' and give up the notion that data is somehow stored. It's dynamic, not static," says Michael Di Stefano, vice president and architect for financial services at GemStone Systems
...
There is even some debate over how small a unit of work can be put on today's grids. Di Stefano at GemStone, for example, says, "One client has gone from 200 trades per second in a program trading application to more than 6,000 trades per second. This shows what the technology can do."

Yep, the writing is on the wall. Oracle knows it too.

http://www.google.com/search?hl=en&q=oracle+buys+tangosol&btnG=Google+Search"

Me:
"Good points. If it is in-memory it would be faster. I have not had the pleasure to work on such a system.

I do disagree with the database concerns being a subset of application concerns. The data drives the app. We’re probably getting religious at this point (or am I)."

Mr. M:
"‘The data drives the app.”

Exactly, but who’s to say where the data comes from or in what format? My application data may reside completely in xml files, or maybe I get it from some third party web services a la the en vogue “mashup.” Heck, I may not even need to worry about a database anymore…. http://www.amazon.com/gp/browse.html?node=16427261 The database is only one particular concern of the overall application. And it’s the application that matters. Data is useless if it just sits on a disk somewhere. It’s the ways in which the application lets the users view and manipulate the data that adds value to the business.

Yep, definitely a different type of thinking between application developers and database developers."

Me:
"Definitely religious now.

Applications come and go, data stays the same. Think Green Screens, EJBs, Ruby…what’s next?"

Mr. M:
"'Applications come and go'

Exactly. Businesses are not static, nor are the markets they compete in. Changing applications are a function of changing business processes and changing markets.

'data stays the same.'

Nonsense. Otherwise UPDATE would not be an SQL reserved word. If you mean database technology stays the same, well, I’m more inclined to agree with that.

'Think Green Screens, EJBs, Ruby...what’s next?'

Whatever comes along to let the business more effectively respond to current market realities. Application platforms have evolved much faster than database platforms have. They’ve had to, their sphere of operation is much broader than that of databases, this is only natural, they deal with much broader concerns than do databases. Databases in the internet era function in essentially the same role they did in the era of dumb terminals. Clearly application platforms have evolved orders of magnitude more. Hence the statement, database concerns are a subset of application concerns.

Here’s a simple test….if I take some business application and I’m forced to throw away one or the other, either the database or the appl- wait a second, it doesn’t even make sense to finish it, does it? The business can live without the database. I could do all kinds of things with the data, I could stick it anywhere. The business can’t live without the application though. Another way to look at is, what do the business users look at, test, approve, and use? The database? Of course not, they look at the application. They could care less whether the data sits on disk in an RDBMS, xml, or flat files."

Me:
"We obviously violently disagree.

Without the database (and I use database and data interchangebly), the business could no longer function. The app is meaningless. How would you contact your customer? You couldn’t find it.

'Exactly. Businesses are not static, nor are the markets they compete in. Changing applications are a function of changing business processes and changing markets.'

Poorly designed applications…that is all."

A Feisty Colleague:
"Using data and database interchangeably is incorrect. A database is a mechanism for data storage. XML data sets and flat files are mechanisms for data storage, too. So is a file cabinet, because, the data doesn’t have to be electronic, it could be … gasp! … on paper, and the application to use that data would be hands for holding the paper and a pencil to update and add data to the page."

Me:
"No it isn’t. I take into account xml files, flat files, web services (but not paper, unless it’s scanned) and all that. It would be consumed by the database and then accessed by the application via SQL.

(that’s for Mr. M and the feisty one)"

At which point someone forwarded the home page for Oracle's TimesTen In-Memory Database.

Me:
"A database on/in the mid-tier...Perfect!"

Mr. M:
"Implicit acknowledgment that disk IO operations that come with traditional database access simply can’t match the performance of in-memory data access (a point which you previously were unconvinced of but now seem perfectly accepting of the idea once you see it’s got Oracle’s imprimatur on it).

Of course, why any application developer would want to program against an SQL interface if they weren’t forced to is beyond me. It is orthogonal to the programming model of most application platform languages.

Surely Oracle recognize this fact too or they wouldn’t be buying Tangosol and other data grid technologies. Of course, most of those products are far more technically advanced than TimesTen or anything Oracle has in that space.

Incidentally, it’s illustrative to note that Coherence and other products like it were for the most part designed and built by application programmers. The development of all these products is pretty much driven by the needs of the large investment banks on Wall Street. These trading applications simply had too many concurrent transactions to use an RDBMS (a problem quite a number of public domains now share, most famously google.com, nope, no RDBMS there, yet miraculously there is still data). The database just simply would not scale to such a degree. So the application developers, by necessity, came up with an alternate solution that did work, a fully transactional cache of data replicated across a cluster with node numbers in the thousands, and no relational model whatsoever to speak of. A perfect example of how database concerns are only one, sometimes small, concern amongst many that application developers must be aware of and ready to solve."

Me:
"Like you said initially, the top 1%.

Many of us will never touch a system like this.

I will certainly concede that it is faster (still would love to see benchmarking though), but that still leaves 99% of the applications out there that do not require that kind of performance."

Me (again):
"And don’t forget, I use data and database interchangeably. Applications are nothing without the data right?

As to the object/relational impedance mismatch...well, more people that don’t know how to work in sets. Looping is what they understand. I understand the application side more than you seem to give me credit for.

I’m not saying applications aren’t important, they are. Data (databases) and applications go hand in hand. If the application went away though, they could still access their data via SELECT statements (yes, via an application client tool), however painful that may be. Applications make retrieving data that much easier for our users.

If anyone wants to unsubscribe from this mailing list, just let us know. This is fun for me (I’m guessing Mr. M too)."

Needless to say it was a fun day. It didn't get [too] personal. More than anything I'm happy to have an equally passionate colleague.

Besides, he claims he was just fracking around with me. ;)

Tuesday, December 9, 2008

Application Developers vs. Database Developers: Part II

You can read the first article here. My application developer friend, Mr. M, emailed me and another fine gentleman this little blurb recently:

Mr. M:

OH YEAH BABY!!! TEN TIMES FASTER!!!! YEAH!!!!!!!!

Hey seriously, what a tub of shit Oracle is. Where does this myth come from that it's such a great platform? Their client tools suck balls and it's generally just a pain in the ass to work with from a developer's point of view. But devs for some reason are under this impression that from thew server perspective it's rock solid and performant. Well, it may be solid, but it's a fucking turd. Our dba here - definitely an Oracle guy - has been tasked with looking into moving us onto an oss db. He basically refuses to even look at MySQL, stating that it's a mickey mouse worthless pile of shit (can't really argue with him there lol), so that basically leaves Postgres. So it's been a few weeks now, and he will basically admit now that Postgres completely waxes Oracle as far as performance goes. LOL We run RAC in production too. He's looking at failover, replication, blah blah blah now, we'll see what the verdict on that is. Oh, and Oracle AQ? That's a worthless pile of shit too. Why do they tack a fucking message queue that doesn't play nice with jms onto the fucking database, which is probably already overworked? Oh wait, that's right, they're in the business of selling per cpu licenses! Cocksuckers.

This was prompted by a recent Oracle email blast about the Exadata storage system/Warehouse.

As I did before, I'll just put the email here.

Me:

Agreed, their client tools aren't all the great. Which ones are you using?

I use SQL*Plus (naturally), SQL Developer and JDeveloper. The latter 2 tie in very nicely with Subversion. With JDeveloper, when I want to try out Java, it's got a pretty good suite of tools.

Oracle starting out catering to businesses, Microsoft started with the consumer. Oracle has made pretty good strides in making their software more usable while Microsoft has made theirs more scalable.

I haven't used AQ a whole lot and definitely not with Java. I do know that it's billed as JMS compliant.

Postgres has it's place and so does Oracle. It is a great platform if you know how to leverage it. PL/SQL is a 3GL (or is it 4) and you can write entire applications using it...and if you do that right, an incredibly robust and scalable application at that.

Mr. M:

"It is a great platform if you know how to leverage it. PL/SQL is a 3GL (or is it 4) and you can write entire applications using it...and if you do that right, an incredibly robust and scalable application at that."

NO!!! NO!!! NOOOOO!!!

I want to beat people like you who say this with a ball pean hammer. There are only a select few individuals on this earth who can write and read application logic written in SQL. AVOID THIS ANTI-PATTERN AT ALL COSTS! What is it with you f_cking database guys??? Why do you always want to stuff as much crap into the db as possible?

DUDE! SQL IS A PILE OF SHIT FOR EXPRESSING APPLICATION LOGIC!!!

It's fine for querying and manipulating sets of data, in a relational database. But it is a worthless sack of shit for expressing application logic!

I'm having to dig through this f_cking abortion right now because some Oracle f_ckhead thought "you can write entire applications using it!" Blog that, mofo!

This was followed by a package he had been working on. I wouldn't say it was the greatest, but it wasn't all bad either.

Me:

goodness gracious.

"DUDE! SQL IS A PILE OF SHIT FOR EXPRESSING APPLICATION LOGIC!!!"

disagree (naturally). It's incredibly easy to do, you just don't know how yet...and it seems even the Oracle professionals out there don't either.

I'll tell you this, the crazier the SQL or PL/SQL needed to derive and manipulate data the poorer the design. Start with a good design and it all becomes very simple.

Of course note the use of "naturally" in my lexicon. Thanks Jake.

Mr. M:

well dude, we are back to our old discussion - you arguing that procedural sql code is perfectly fine for building apps, and by extension, that the last 20 years of computer science have basically been a misguided lost journey down the meandering, fruitless trail of oop. um.....no. select this from that. otherwise keep that sql crap caged up where it belongs.

btw, do the db guys here suck? seriously. i'm not competent enough to judge. (to be fair, apparently that crap i sent you is fairly old stuff, but still....)

Me:

I would say, based on limited knowledge of software development, that the OOP movement was started because the database (specifically Oracle) was not mature enough to do what was needed. Plus, I seem to recall that the OOP movement was supposed to have solved all the world's problems by now.

It's further propogated due to the needs you specified in our discussion that day at WellCare (i.e. performance). I still believe that if application developers better understood how a DB works (Oracle, MySQL, SQLServer, etc) they would end up writing less code. Database constraints alone force you to write less (and better) code simultaneously ensuring good data.

The code that I did look at (first 1000 lines or so) isn't great.
1. With all those IF THEN ELSE statements it's telling me that there's probably a better way to store the data. Looks like they're missing an attribute that should be contained with a track.
2. using Object (PL/SQL) types to store data in the way they seem to be doing it is not the best way. Again, probably a design issue.
3. When you do something like this:
   UPDATE pb_album_metadata
      SET primary_digital_flag = 0
    WHERE album_id IN (SELECT b.album_id
                       FROM (SELECT a.album_id AS album_id,
                                MAX(a.album_id) OVER (PARTITION BY a.standard_upc) AS latest_album_id
                             FROM pb_album_metadata a
                             WHERE a.standard_upc = g_album_tab(1).standard_upc ) b
                       WHERE b.album_id <> b.latest_album_id )
         AND primary_digital_flag <> 0;
They should probably have considered end_date as an attribute of the album metadata. While analytic functions are pretty cool, they're more for analyzing (OLAP) and not OLTP environments.

That's for starters and without table definitions...

Me (again):

oh yeah...and PL/SQL is/was built on top of ADA, FYI.

Mr. M:

"I still believe that if application developers better understood how a DB works (Oracle, MySQL, SQLServer, etc) they would end up writing less code. Database constraints alone force you to write less (and better) code simultaneously ensuring good data."

Huh? What are we not understanding? What would be an example of a constraint that would force us to write less and better code?

Me:

CHECK, NOT NULL (same as CHECK) and FOREIGN KEY constraints all fit into that category.

Instead of having to check if a value is NULL when submitting it to the database, just submit and let the database throw an error, which you'll conveniently capture in your Exception block. Same goes for CHECK constraints, columnA should be either one of three values (test, test1, test3), you won't have to check that in your application code, just catch the exception. FKs insure good data (proper relationships).

A different perspective. If you are going to pay for it, use that damn thing. If you don't care, don't do it. But over the course of an "enterprisey" application, you'll end up writing more code and make maintenance that much more difficult (did I do it there?). Just those constraints will force you and the teams of application developers to put good data into the database.

You can still do it in the application of course (form validation is a good place)...

Mr. M:

Ahh, jeez dude, I wasn't sure if you were referring to the literal "constraint" or not.

Dude, even f_cksticks like ~~redacted~~ I think have a decent enough understanding of when and how to use db constraints. It's when you get into things like cursors or cost plans of subselects and anonymous tables (i think that's the name for it - where I say select something from (select some other crap). Then we defer to db gurus like yourself.

But dude....."you won't have to check that in your application code, just catch the exception".......uh, don't ever repeat that off our little email group. And definitely don't go posting that on your blog. F_ck me man, it's a damn good thing we keep you db folks caged up in that rdbms box....

Me:

So we've had this discussion at work...for a high transaction system, do Java/C/etc handle exceptions well or what?

Why is it bad to deal with exceptions rather than coding to avoid them?

I highly doubt even ~~redacted~~ understood database constraints...him and his cohorts believed all database software would be commoditized and MySQL would be king in short order.

Mr. M:

"for a high transaction system"

Or for any system really....

To use your example of check constraints (is this value one of....) or not null constraints, checking these rules in the Java code and in the database code would seem to violate DRY. But we do that alot, and it is acceptable in certain cases. For instance, we also probably violate DRY if we're checking this same rule say in Javascript on the front end. But we accept this tiny violation of DRY because it tends to make for a better user experience and as a performance gain too, for we avoid the round trip to the server. Now, what your advocating here is close to the same thing. You're basically saying, don't check for the not null constraint in Java code, just go ahead and hit the database, let the database throw back an exception to the Java code, presumably correct the problem, and then make another roundtrip to the database again. Dude, what are you thinking?!? This to say nothing of the fact that this also could be considered a violation of Fail Fast, and a violation of Defensive Programming - what happens if the dba forgot to add the not null constraint in production?

Dude, listen to this guy. For a "high transaction system" basically you want to treat the database, as much as you can, as just a dumb data holder. A f_cking dumpster, that you just throw sh_t into and pull shit back out, using no vendor-specific features if at all possible.

Again, for we've had this discussion, but even in everyday apps, not just on Wall Street, the database is the bottleneck. And the database is the hardest layer to scale. So given those facts, you should design your app to do as little work in the database as possible.

I was laughing at this point because the link above points to one of our consulting architects (I'm not really sure what his role is at this point).
Me:

i agree in any application that you want to minimize the number of round trips...

shocker...he's one of our architects. he's spot on in many instances, but...

database is the bottleneck because people don't know how to write SQL. I'll certainly concede the wall street applications (for the time being anyway), but the rest of us with what we do on a daily basis...Oracle will scale far beyond the demands they place. When that bottleneck shows up, 9 times out of 10 some dumb-ass c#/java guy thought he could write a better query than i. besides, what's the idiot doing anything but a simple join anyway? probably poor data model to start with...and we're right back where we started (sort of).

Mr. M:

"database is the bottleneck because people don't know how to write SQL.....some dumb-ass c#/java guy thought he could write a better query than i."

Dude, I'll grant you, people don't know how to write SQL, myself included. But that's not always why the database is the bottleneck. I think it's safe to say that's not even the majority of the reason. Yes, there are apps written by people who were just idiots, and they needlessly pummel the database, but that's definitely not the majority of scenarios. At my work the database is the bottleneck, and we run RAC in production. It's my understanding that even with RAC, there is a limit to how much you can scale that out. But any layer up from the database we are basically unlimited in how much we can scale that out. So it's very easy to stick another Apache box in front, or fire up another Weblogic server. But I can't do that with the database. We have 24 Weblogic servers for ~~redacted~~. The database is the bottleneck. And we don't have shitty sql code in the app. In fact, we have very few hand-written queries anywhere in the app. Persisting something to a database is really a low-level concern that as an application developer I shouldn't even have to be bothered with, except for the rare corner cases where the persistence abstraction I'm using is just a little bit too generic to handle things effectively. And we don't use these ORMs because we don't know how to write sql. To effectively use an ORM tool requires a deeper understanding of sql and databases than simply being able to write solid SQL code. (Not saying Java devs who use ORMs know SQL better than a dba, just that it requires better sql skills than is required of a Java dev to simply write JDBC/SQL.) Now, before you try to tell me that my ORM library isn't able to write as efficient of sql code as a dba of your caliber, keep in mind that ORM tools are pretty advanced. They're able to intelligently do things like batch sql updates, and let you apply transactional semantics much easier than with raw jdbc/sql. But the overwhelming reason developers have so thoroughly adopted ORM is because Structured Query Language is such a nasty piece of shit for expressing application logic. SQL is a declarative, procedural language. It's totally unsuited for writing application logic! This, more than anything else, is why organizations and dev teams should seek to restrict what is handled within a relational database as much as possible - because the programming interface to it is a fucking ancient backward dying dinosaur.

Mr. V (note, not Mr. M):

My 2 canadian cents:
The polyglot approach "... use different languages for different domain ..."
Database was developed to manipulate data and should remain there.
General purpose language was developed to encapsulate logic and should remain in that domain.
You should not use DB to encapsulate business logic (in my opinion) no more than you would use HTML to create complex logic.

While Java, C#, Python, etc are described as General Purpose languages, they, too, are really domain-constrained. Their expressiveness are confined (and should be) to express low-level, tersed, explicit, verbose, and repetive logic (if that makes any sense). Languages such as these are more suitable for low-level abstraction on top of which, richer, more expressive languages can be built. We are seeing this now with the emergence of languages on the JVM (and arguably on the .Net's CLR).

I think SQL as we know will take a back seat and a smart developer somewhere will create a new domain-specific language that lives on the VM and will push the SQL expression out of the RDBMS and closer to the code that needs it. We are not there yet, but Microsfot is trying (see LINQ and all ORM papers). This is not to say that there won't be isntances where tuning in the SQL-native language won't be necessary. However, 80 to 90% of simple CRUD cases will be handled closer to the logic code that uses the data code.

Again, that's my 2 canadian cents... I could go on. But I have a meeting with ~~redacted~~.

Mr. V, I believe, is a little bit more sensible. Mr. M on the other hand is just trying to rile (sp?) me up.

Me:

Someone will probably create something like that, but it still gets at the heart of one of my arguments, many developers don't know how to use a database thus will go to any means to circumvent it. Embrace it I say.

Ultimately for me, it comes down to simplicity. I can write an application using PL/SQL that will scale very well for 90% of the solutions. Of course that doesn't include the "fancy" javascript/Ajax stuff needed for the UI. In my ever so humble opinion, this is a much better solution for a business in the long run.
1. You're more likely to change the middle tier than the database tier. Java, asp, Ruby, what's next?
2. Fewer moving parts means lower cost of ownership, even with the CPU costs. One person, one, was able to build and maintain a 350 page website. Be hardpressed to do that with the more expressive languages.

I think all of us are purists and very passionate about what we do. It's probably the main reason we get along. I thoroughly enjoy these conversations because it does force me to think...and that's always a good thing.

Mr. V:

Haaa chet.
You sorta made my point than diverged away. Building an app in all PL/SQL is dangerous. It's no safer than me building an app in all Java. I can build very limited app in all Java. The moment I need to interact with other domain (UI, data, low-level native code, etc), I have to switch to something that is closer to what I am trying to do. If I need to create UI, I will pick a ui-centric environment, when I need to talk to DB, I will pass connect to a db and send in my SQL, and so forth. I will use Java as the glue to bring it all togher.

In the end, we may be saying the same thing, but using different accent. O well.