Wednesday, December 19, 2012

Learning Pentaho Data Integrator

aka Kettle, aka PDI.

I've recently taken on a data integration gig and I'll be using Pentaho Data Integrator (PDI). I've read about Pentaho for years but never got around to actually using it. I'm excited about the opportunity.

What is PDI?

...delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools.

I'll be using the enterprise edition (EE), which is supported, similar to how RedHat works...I think.

This post if mainly for me, naturally. I'm going to list out the references I've found so far and add to it over time. Similar to what I do (err, did) for Learning OBIEE.

Actually, I'll just start with the helpful email I received after being added to the account.
I love that you can download and play with the software yourself. Of course the Community Edition (CE) is open source, so that makes sense. I'm not sure if you can get the EE version for free though.

There's a community page as well with links to a lot of great resources. So far, my favorite has to be the IRC channel hosted at freenode. Note, there are two hashtags as in ##pentaho. I've been lurking there for a few weeks and finally got up the nerve (what? me shy?) last week. HansVA and mburgess_pdi helped get me moving again on a particular problem. Good stuff.

I'm sure I'll add more as time goes on. That's it for now.

Update after original posting...
  • (Kettle, aka PDI) Wiki


Stewart Bryson said...

I'm curious about whether it has push-down/set-based type functionality, or whether it's a row-by-row processor. Let me know as soon as you find out.

oraclenerd said...


Initial look-see is row-by-row. If that changes, I'll let you know.