Monday, October 25, 2010

LegStar for Pentaho Data Integration (Kettle)

I have spent the last few weeks digging into open source ETL tools.

The 3 most quoted products are:

It turns out Clover is only partially open source. The GUI is proprietary. So I spent more time on PDI and Talend.

PDI is the oldest product and, perhaps as a consequence, has the largest community. You can get a sense of that by comparing the Ohloh page for PDI to the Ohloh page for Talend.

But if you compare new threads per day on the PDI forum to that of the Talend forum, you can see that Talend is doing good too.

I decided to try out PDI first and developed a proof of concept implementation of LegStar for PDI

You can see the result on Google code as usual.

I have to say that I am very impressed with PDI, a product originally called Kettle and developed by Matt Casters.

PDI comes with a framework for people to develop additional plugins. For those who are interested, there is an excellent blog entry by Slawomir Chodnicki to get started.

I was able to reuse part of the PDI internal test framework to automate testing of my plugin. I have automated unit tests and integration tests

It is also quite easy to deploy new plugins in PDI. It is a matter of packaging the plugin as a jar, and dropping it to a particular location.

As usual, it is extremely helpful that the product is open source. In particular, I could easily debug my plugin in Eclipse, stepping through PDI code as well as my code.

My only regrets with PDI is that there is little Maven support and that the code is often not commented. This being said, that did not prevent me from using Maven for all lifecycle phases of my plugin and was able to find my way into the PDI code which is usually readable enough.

PDI also has support for parallel processing and clustering that I did not explore yet. I am looking forward to playing with these features next.

1 comment:

  1. Hi Fady,

    the other day we're talking about the idea for the plugin on IRC, and merely a few days later it's out there. How cool is that? :)

    Congrats and keep up the good work!

    Slawo

    ReplyDelete