Thursday, January 6, 2011

Batch integration with CICS, ETL integration with ESB

On mainframes, integration between batch and online processes is not simple. In an IBM CICS environment for instance, if files and databases need to be shared between batch and CICS (and they most always do), you have to deal with a number of issues such as:

  • Batch processes might update numerous records. It is usually inefficient to commit those changes after each record change. Large number of uncommitted changes means large numbers of locks which in turn affect online activity by slowing down response times or even leading to time out errors.
  • Batch processes often deal with files in addition to databases. The most widely used technique to restart a batch after a failure is to backup these files before the batch is started and restore them in case of failure so that the batch can be restarted. Any online activity that dealt with the same files between the batch start and failure worked on uncommitted data.

To avoid such issues, many mainframe shops initially segregated batch and online activities in different time frames. Typically CICS systems were brought down in the evening while batch processes were running and restarted in the morning.

Because of this strong separation between online and batch activity, very little integration between batch and CICS systems was developed. There was some degree of code reuse with COBOL copybooks but you would never see binary reuse (a batch program calling a CICS program for instance). Actually this was not even possible before IBM introduced the EXCI technology.

The mainframe nightly batch window rapidly came under pressure though. As mainframes grew larger and merged with one another, they started serving populations across many time zones. It is not uncommon that users are all over the world. This forced mainframe shops to rearchitect their batch processes and lead IBM to develop new technologies such as EXCI and VSAM Record Level Sharing.

At the same time, the need for business logic reuse between batch and CICS became more important because systems became more complex. Mainframe developers resorted to database triggers, stored procedures or WebSphere MQ triggers, probably beyond the original intent of such technologies, because there was no other way of sharing logic at the binary level.

I am seeing some similarity with the ETL versus ESB situation in the Java world. These systems do very little to integrate with one another today.

By nature, ETL is similar to batch in the sense that it deals with large amounts of records and multiple file systems and databases. ESB is closer to online as it deals with smaller transactional events.

Both ETL and ESB products claim to be transformation engines though and indeed the term "Transformation" is widely used in both type of products documentation. So you might wonder why it is almost impossible to reuse a transformation written for an ETL in an ESB.

If IBM was forced to introduce EXCI for Batch to CICS communication, I wouldn't be surprised if users forced ETL and ESB vendors to integrate with one another more closely.

I don't mean that ETL and ESB technologies need to be tightly integrated, after all they do different jobs, but yet it would be nice if some level of transformation reusability can be achieved.