Thursday, December 20, 2012

PDI: Pass Parameters to Jobs/Transformations

I had been working on trying to get a process to run for each file. I used the Get File Names step followed by the Copy rows to result step. I had placed this in front of my Text file input step, which is where you define the file for further processing.

That method produced a stream (that's what it's called in PDI) with each and every file and each and every record in those files. If I were just loading that into a table, it would have worked. However, I was assigning a identifier to each file using a database sequence. I needed a sequence for each file, but I wasn't getting it.

With some help and pointers from the ##pentaho IRC channel, I found this post (more on that one in the future), Run Kettle Job for each Row. I downloaded the sample provided to see how it worked.



The calc dates transformation just generates a lot of rows. Not much to see there. The magic, at least for me, was in the run for each row job entry.



Specifically, the Write to log step. (I have this need to see things, since I don't understand everything about the tool yet, Write to log provides me that ability.)



See date, better, ${date}? That's how you reference parameter and variables.

I ran the job and watched the date scroll by. Nice. Then I tried to plug it into my job.

Zippo. Instead of seeing, "this is my filename: /data/pentaho/blah/test.csv" in the log output, I just saw "this is my filename:" Ugh. I went back to the sample and plugged in my stuff. It worked. Yay. Went back to mine, it didn't. Gah! I tried changing the names, then I'd just see "this is my filename: ${new_parameter_name}" so it wasn't resolving to the value.

Finally...after comparing XML for the sample file and mine and finding no real differences, I just about gave up.

One last gasp though, I went to the IRC channel and asked if there was some way to see the job or transformation settings. No one was home. I tried right-clicking to bring up the context menu and there was Job Settings



Job Settings brought up this one:



date is defined there. I checked mine. Nothing defined. Added filename to mine, ran it, Success!

Wednesday, December 19, 2012

Learning Pentaho Data Integrator

aka Kettle, aka PDI.

I've recently taken on a data integration gig and I'll be using Pentaho Data Integrator (PDI). I've read about Pentaho for years but never got around to actually using it. I'm excited about the opportunity.

What is PDI?

...delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools.

I'll be using the enterprise edition (EE), which is supported, similar to how RedHat works...I think.

This post if mainly for me, naturally. I'm going to list out the references I've found so far and add to it over time. Similar to what I do (err, did) for Learning OBIEE.

Actually, I'll just start with the helpful email I received after being added to the account.
I love that you can download and play with the software yourself. Of course the Community Edition (CE) is open source, so that makes sense. I'm not sure if you can get the EE version for free though.

There's a community page as well with links to a lot of great resources. So far, my favorite has to be the IRC channel hosted at freenode. Note, there are two hashtags as in ##pentaho. I've been lurking there for a few weeks and finally got up the nerve (what? me shy?) last week. HansVA and mburgess_pdi helped get me moving again on a particular problem. Good stuff.

I'm sure I'll add more as time goes on. That's it for now.

Update after original posting...
  • (Kettle, aka PDI) Wiki

Friday, November 9, 2012

VirtualBox 4.2.4

I was working on my OBIEE Test Lab today. Having network issues, because I'm prone to those. I decided to look up The Fat Bloke as he seems to be the resident expert on all things VirtualBox. As to my network issues, The Fat Bloke had this great post back in June, which helped me understand how VirtualBox network interfaces work.

So, as I said, I was looking for some new articles. I ran across this one, What's New in Oracle VM VirtualBox 4.2?. There was this section on Groups. I tried it out on mine...it didn't work. I check my version, 4.1.something. Hmmm. The article was dated September. I should have this. No sooner did I think that, I got an update message from VirtualBox saying a new release was available. Sweet!

So now I'm running the latest:



So what is this group stuff? You can go read the article, but I'll quote the important stuff here:

Groups allow you to organize your VM library in a sensible way, e.g. by platform type, by project, by version, by whatever. To create groups you can drag one VM onto another or select one or more VM's and choose Machine...Group from the menu bar. You can expand and collapse groups to save screen real estate, and you can Enter and Leave a group (think iPad navigation here) by using the right and left arrow keys when groups are selected.

But groups are more than passive folders, because you can now also perform operations on groups, rather than all the individual VMs. So if you have a multi-tiered solution you can start the whole stack up with just one click.

Very cool stuff. Now I can logically group my OBIEE Test Lab VMs. If I ever get around to having the software (database, OBIEE) start automatically, I'll be rocking.



Lots of cool new stuff there. Read the article and go get the software.

Thursday, November 8, 2012

BIWA Summit 2013

I meant to write about this sooner...

If you're looking for a good BI/DW/Analytics focused event, check out the BIWA Summit which takes place in January of 2013. If you're interested in speaking at the event (and you know you are), hurry up and get your abstract in here, it closes tomorrow (November 9th).

Day 1 will give us Tom Kyte who will talk about What's new from Oracle in BI and Data warehousing. Day 2 will feature Vaishnavi Sashikanth, Vice President, Development, Oracle Advanced Analytics at Oracle who will speak on Making Big Data Analytics Accessible.

For more information, go here and here.