Showing posts with label gmyers. Show all posts
Showing posts with label gmyers. Show all posts

Sunday, January 3, 2010

2009 - In Review

Crazy year...but that seems to be the norm. Is crazy my normal? I wouldn't be suprised.

Numbers
  • 289 - Number of posts this year.
    • 37 - September had the highest number of posts.
    • 18 - February - lowest
    • 24 - Average posts per month
    • 76 - Number of days not covered by a post. In reality though, there were numerous occasions where I posted 2 or more times.
  • 439 - Number of posts it took to get to the magical $100 mark with Google AdSense. I don't even want to do the math on that. BTW, I reached the $100 mark on December 28, 2009. Here's to perseverance.
  • 94,998 - Pageviews according to Google Analytics
  • 66,313 - Visits according to Google Analytics
  • 465,963 - Page Requests according to my web server (GoDaddy)
  • 1,015,934 - Server Requests according to my web server
I'm not sure what the disparity is between the web server stats and Google Analytics. I tend to put more faith into Google Analytics though. (Of course seeing that 1 million is cool, even if it is only server requests).

Pretty Pictures
Here's a screenshot from my Google Analytics page...in July, I hit 5,000 Visits for the first time

july analytics

I'm still not real clear what happened here

wtf?

That was a 2,000 visit jump. I originally thought it had to do with my shameless pining to go to OOW 09...but that was late August, early September.

And to give you an idea of the progression since I started

wow!

The 2,000 hit jump is much more dramatic in that picture.

Guest Authors

Back in July (well, technically in May), I opened up the blog to guest authors. Mainly I wanted to give people of all experiences an opportunity to try out their writing skills...to see if they would like it. I didn't exclude established bloggers though. Of the 5 who participated, 4 already had blogs and 1 had been considering it.
I know Ted's got one (almost) ready for posting and I'm trying to talk Jeff Haynes and Brad Tumy into doing guest spots as well.

Thanks to all of you who participated...I can't say that enough.

So I hope you enjoyed 2009 as much as I did. I'm not sure how long I can keep this up...it's a lot of work. I think it's worth it though.

Monday, December 21, 2009

How Do You Normalize a Tweet?

Second post by Mr. Myers, you can read his first one, How To Kill a Code Review here. I have always liked design topics, I don't think they are covered enough on the web, which is why I liked this one. I have often wondered what trade-offs designers make for these types of applications (Twitter, Facebook, etc). Are they even "designed" by a data modeler? Or are they created by application developers? Not really sure it matters to those companies as they are successful (in a strange, no business model kind of way) and, I don't believe, represent many of the realities that we as Oracle professionals are likely to deal with.

Firstly, I don't tweet. Alex (Gorbachev) mentioned it at a Sydney meetup. I had a look, but didn't get entrenched and I assume there will be others out there who aren't tweeters. Suffice to say a 'tweet' is a message broadcast by a twitter user to the twitter consumers. They are up to 140 characters long.

So what's to normalise ? Isn't it just a value ? Even Oracle 6 could cope with VARCHAR2(140).

But actually, a tweet isn't just a simple value.
A search for "beer" would turn up all messages that included #beer.
Similarly, the @ sign followed by a username allows users to send messages directly to each other. A message with @example would be directed at the user [example] although it can still be read by anyone.
Source: Wikipedia

First normal form states
There's no left-to-right ordering to the columns.

Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).

All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden timestamps].
Source: Wikipedia

The problem is that the tweet "@tom Come for a #beer or #burger. Don't let @harry come" definitely has hidden components, but there is a sequencing in the message that is just as important.

In a practical implementation, we would probably have the following tables:
TWITTER_USERS        : username (eg @tom), date_joined, email....
TAGS : tag (eg #beer)
TWEETS : tweet_id (surrogate key), created_by (referencing twitter_user), created_timestamp, tweet_text...
TWEET_TAGS : tweet_id, tag (eg #beer)
TWEET_DESTINATIONS : tweet_id, username (eg @tom)
Our message would have the two child tag records (#beer and #burger) and two child destination records (@tom and @fred).

At the logical level, we are not properly normalized because we have the tweet_text duplicating information from the child entities and the potential for inconsistencies between them. We can say that the tweet just seems to contain duplicate information but it is really different. Is that just being picky ?

I am not suggesting the relational model is wrong, broken, incomplete or inadequate. Quite the reverse, in fact. In this case the value of the model is that it tells us the problems that will arise when we denormalise data.

For example, if @harry deletes his twitter account (because he was never invited for beers), do we delete the tweet_dest that referred to him or do we keep it and not enforce that referential integrity constraint ? If we delete the tweet_dest, we have an inconsistency between the tweet_text attribute and the tweet_dest child entities. Or maybe we delete the tweet entity itself and all its children. Those are really choices for the business (possibly with some legal implications though).

I don't have a solution to the logical model representation, and would be interested in feedback. But not by twitter please :)

Sunday, August 23, 2009

How To Kill a Code Review

Today's guest post is from Gary Myers from Igor's Oracle Lab. I was first introduced to Gary via comments left here. I can't find the first one of course...but he has left plenty of them. All well thought out and informative. Most recently he introduced me to the ability to do a bulk bind using %ROWTYPE here.

This is a topic that is all too often ignored, as you all know.


Steven Feuerstein states here that "Everyone knows that code review is a good idea"

The problem is what happens after the review.

Once upon a time there was a piece of code. That code had been in production for a long time, ran pretty slow but not slow enough that it had reached the top of the pile of stuff people complained about.

A change was done to that code for a new enhancement, and in one of those bursts of enthusiasm that sometimes hits a development team, it got subjected to a code review.

The whole structure of that code was ugly, with unnecessary nested cursor loops. The big kicker for performance was that at the end of one of these inner loops was a TRUNCATE TABLE. Because when you are deleting all the rows from a table, every-one *KNOWS* that a truncate is fastest, right ? Of course, as the TRUNCATE is DDL, it meant that all the SQL using that table inside that loop was getting re-parsed each and every time through the loop.

There were other problems in the code too. I believe one was with variables not being re-initialized at various points in the loop, so there was a risk of incorrect results in some unlikely cases.

The verdict of the review was that the code needed a re-write. The problem was, since it was already in production, no-one wanted to admit that there were bugs in it (and the users hadn't spotted any incorrect results). The new code would go into a future patch, but that wouldn't go live for months. However it had been promised for delivery to a test environment. A rewrite would mean missing the drop deadline.

A quick-fix could be done to improve performance. A rewrite could be done and the deadline missed. The quick-fix could be done and still meet the deadline, then a later rewrite to fix the underlying problems (but those problems probably would have been blamed on the quick fix).

A compromise was reached. Since the change to the code didn't actually introduce any new bugs, it would be allowed to go through to test with no changes from the review. And there was a promise to actually rewrite the code.

Of course, once the delivery was done, lots of other priorities came ahead. I don't know whether the code ever got the rewrite, but I suspect not. Definitely, for at least six months, there was a batch job taking hours when a five minute code change could have cut it to minutes.

At least the developers who participated in the review learnt that a TRUNCATE has drawbacks. The code reviews pretty much never happened again though.