LegStar: 2009

Friday, December 18, 2009

A Maven plugin to upload sources on z/OS

LegStar is a mix of Java code and z/OS native code written in C370 and COBOL.

The LegStar build system makes extensive use of Maven, a widely used build automation tool. Maven drives the LegStar release process by pulling source code from SCM, compiling, testing and generating documentation.

All the legStar z/OS native source code is managed locally using Subversion. What was missing so far was the capability to upload and compile these sources on z/OS as part of the Maven build lifecycle.

After several attempts at using the ant FTP task, I decided to write my own Maven plugin. Following the tradition of open sourcing everything in LegStar, you can find the zosupload project on google hosting.

It is quite rudimentary at this stage but it has the minimal capability to upload sources in various PDS libraries, submitting JCL for execution and checking condition codes returned as part of a Maven build.

I expect others will have the same mix of local Java and native z/OS code they would like to keep in sync so I am hoping they will contribute to that project.

Monday, December 7, 2009

COBOL-aware middleware

Two phenomenons with opposing effects have affected COBOL structures over time.

The first one is a tendency to construct complex data structures with many fields. This would allow to store a maximum amount of information in a VSAM record or database segment for instance. This was an important factor in reducing physical I/O and getting the best performances possible. When data structures were passed from program to program, they also tended to be large structures because developers would typically try to reduce the number of programs and hence, would rather write big programs that handle many cases than many smaller programs.

When the mainframe opened up to program to program communication over networks though, these large structures became a problem because networks were slow. So the next tendency was to try to find ways to reduce the size of the data sent over the network even if the structures stayed complex and large.

The most widely used mechanism to reduce the size of the data is the variable size array (an array with the DEPENDING ON clause). This is a awkward type of array for people used to java and C. It is an array which actual size is given by a variable, appearing somewhere in the structure (before the array itself).

The second mechanism is not related to COBOL per-se but more with the middleware used. Because structures contain many fields, it often happens that these fields contain default values. For instance, alphanumeric fields might contain binary zeros, which is usually interpreted as "no content". If the middleware is smart enough, it can avoid sending these non initialized fields over the network thus saving a lot of bandwidth.

Variable size arrays also need to be handled correctly by middleware. For instance sending the maximum number of items would be wasteful. A more tricky situation arises when the middelware invokes a mainframe program that returns a variable size array. Often, the memory needed by the target program to operate must be allocated beforehand by the middleware. But how can the middleware anticipate the number of rows that the program will return?

As you can see there are good reasons why middleware used to communicate with mainframe programs need to be COBOL-aware.

Sunday, November 22, 2009

First experience with cloud computing (sort of)

Like many of you I am sure, I have been pounded with loads of articles on cloud computing.

I start to consider that there is something in a technology beyond pure marketing hype when IEEE Internet Computing dedicates a complete issue to the topic, which they did in October 2009.

So I decided to take a look and started using Google App Engine because it is free below certain quotas and also because I had some early experience of the Google Web Toolkit which is well supported by GAE.

GAE allows you to run java applications, actually classical J2EE Servlets, with a number of restrictions (for instance you cannot write to a file or open a port).

Since we recently restarted the COBOL Structure to XML Schema project, it seemed like a good idea to deploy it as a service available on the cloud. This way, developers who would like to get a sense of what the product does without investing time downloading and installing can do so.

The result is now available and I almost immediately started to get hits... and problems.

The first problem of course is that the translator is not a COBOL validating parser. I mean it is not a complete syntax checker. It is meant to process COBOL fragments that are supposed to compile OK in the first place. And yet, it is tempting to type COBOL statements in the input textarea of an HTML page, starting at column 1 instead of column 8. Today you often end up with an empty XML schema because the parser dropped everything that it did not recognize.

I guess we'll have to add more syntax checks after all.

The second problem has to do with GAE itself and Java. It has been discussed extensively on the Google App Engine group. When requests are received by GAE it picks up an instance of a VM somewhere on the cloud to service it. Chances are that this VM was last used for something totally different from running your own application. For this reason, Google actually cold starts the VM, which results in a large consumption of CPU... that counts against your quota!

Humm. That first experience has changed my view on cloud computing!

Wednesday, November 18, 2009

COBOL is weakly typed

Yes, whatever proponents think, COBOL is far from an ideal programming language.

Try to push an integer into a Java String and you will be stopped at compilation time. Now try something like "MOVE 15 TO A" where A has been defined as PIC X(2) and not only the compiler will let you through but even at runtime, that will actually work. You end up with the EBCDIC representation of characters '1' and '5' as the content of A.

Actually, you can move pretty much anything into a COBOL PIC X. You commonly do things like "MOVE LOW-VALUES TO A" where you end up with A filled with binary zeros. This is particularly useful in CICS/BMS applications where a field filled with low values will result in no data sent to a screen while a field filled with space characters will increase the volume of data sent (and therefore use up the, once precious, network bandwidth).

This is always a source of problems with integration solutions because the most natural mapping to a COBOL PIC X is a Java String. So you actually map a COBOL type that is not limited to characters to a Java type that only accepts characters.

It is important that an alternative mapping be available. Typically it is sometimes necessary to map a PIC X to a Java byte[] because the content cannot predictably be converted to characters.

When mapping to a Java String is mandatory (it usually is), it is also important that low level conversion routines remove non character content coming from the COBOL program before the Java String is populated.

Conversely, it is important that non character content gets inserted in the COBOL data when needed. For instance, if a COBOL field needs to be filled with low values (as opposed to space characters) then the integration conversion routines should provide an option to do so.

Keep in mind that the mainframe program might react differently to a field filled with low values as opposed to one filled with characters!

Friday, November 13, 2009

A new COBOL structure to XML schema translator

I have posted an initial release of a new project called legstar-cob2xsd that basically does COBOL structure to XML Schema translation.

There were several factors that led to this project:

The legstar-schemagen module in LegStar that does the job today was written in C and the parsing logic was hand coded. I think this might have driven away some users. legstar-cob2xsd is pure java and uses ANTLR grammars for COBOL structure parsing.
There is a clear need for an autonomous open source COBOL to XML Schema utility. So it makes sense to isolate that feature in its own project. People who need just that functionality won't have to figure out how to extract it from the other LegStar modules.
legstar-schemagen was systematically adding LegStar JAXB style annotations to the XML Schema produced. While this is still needed if you want to use LegStar for runtime transformations, this is not the default option anymore. This means people can use the resulting XML Schema totally outside LegStar if they want.
The clear separation of the COBOL parsing logic from the XML Schema generation means it is much easier now to create other targets. For instance JSON schemas.
Finally the fact that legstar-cob2xsd is in java allows JUnit tests to be much more comprehensive (and they are!).

Tuesday, November 10, 2009

What does CICS equates to in the Java world

It is always tough to explain what CICS exactly is to someone with a Java background.

One misconception I often hear is that it is similar to JBoss TS (Arjuna), the transaction manager in JBoss.

I think the issue is with the "Transaction" moniker. In the Java world, it is generally understood as a "Database transaction", something that has to do with coordinating resource updates.

In CICS, the meaning is different. If you monitor an active CICS system, what you will see are a number of active transactions. CICS transactions are primarily scheduling units. A CICS transaction is associated with an initial executable program which can in turn call other programs. CICS transactions get alloted chunks of memory for their programs to use, get CPU cycles when CICS decides, are authorized or not to a given user, etc. That's why CICS is called a Transaction Monitor: what is does is monitor CICS transactions.

A CICS transaction might very well never perform any update to any resource. It would still be a CICS transaction. A CICS transaction can send a form to a screen or receive data entered from a keyboard. CICS transactions can save conversation contexts, pretty much like servlets do.

Now, if a CICS transaction (or to be more precise some program it is associated with) does perform an update to a resource (say a VSAM file), then the CICS transaction is actually the default boundary from a "database transaction" standpoint. This means that when the transaction ends fine, all updates are committed but if the transaction fails (abends in CICS jargon), then all updates since the start of the CICS transaction are undone.

The CICS API also has COMMIT/ROLLBACK to give programs more control over the database transaction. From this standpoint, CICS is a local transaction manager. Now this is true for resources such as VSAM but more complex systems, such as DB2 or IMS-DB, have their own transaction managers. So who does the distributed transaction coordination that is needed when a CICS transaction updates both a VSAM file and a DB2 table?

Well, that would be RRS (Resource Recovery Services), a separated address space (process) in z/OS. That would be the closest thing I can think of to JBoss TS in a z/OS world.

Tuesday, November 3, 2009

When asked what the competition to LegStar is, I generally reply no one, because there are no other open source integration solutions for mainframes (to the best of my knowledge).

There are several commercial competitors though. They all seem to gather in the same places. DataDirect and HostBridge are the only ones missing from this list.

Most of these vendors (maybe Seagull to a lesser extent) are mainframe-centric. What I mean by that is that their products run natively on the mainframe, are priced/licensed accordingly and are sold to mainframe minded project leads. This is a closed circle. No place for open source there. Customers are not asking for it and vendors don't want to hear about it.

But out there, there is a growing population of java minded project leads and developers who are taking over higher and higher responsibilities in large IT departments. Many times, they have to deal with COBOL/PL/I/CICS/IMS applications as part of their projects.

This younger generation is a lot more responsive to open source. We are working for them.

Thursday, October 29, 2009

ESBs: translators in an SOA world

A strange article has popped up on zJournal about SOA. I am not sure I fully understand the content but it seems to be an advice to CIOs basically saying: SOA is difficult to get right and ESBs are a bad thing.

This is the first time I read something clearly hostile to the ESB architecture. The author seems to believe that they are too complex and that they mostly try to imitate old EAI systems, rather than embracing SOA.

The article seems to imply that the only way to pure SOA is rewriting applications is such a way that they are natively service enabled. It states, in several places, that "API"s are a bad thing.

Following this idea, I guess you would have to replace SQL, SMTP, JMS, Excel macros,... by Web Services.

It's a little bit as if you were saying: all these languages that people speak around the world make communication difficult, let's all speak Esperanto.

As long as we are not all speaking Esperanto, there will be a need for Translators. I guess that's why ESB's are so useful to an SOA.

Sunday, October 18, 2009

Mainframes are not open source friendly

IBM is a strange beast. On one hand, they gave away Eclipse and the Apache Software Foundation but on the other, they don't seem to embrace open initiatives when mainframes are involved.

You can read more about what happened to PSI and T3 Technologies on openmainframe.org. These examples are more on the hardware side but there is no sign of IBM opening up any of its z/OS software.

Most of the open source initiatives I have heard of such as http://www.opencobol.org/ or http://h3270.sourceforge.net/index.html are from third parties and do not attempt to run on z/OS itself.

I think that IBM's attitude toward open source on z/OS is harming the mainframe community and is counterproductive in the long term.

Sunday, October 11, 2009

Mainframe-centric integration costs

Gregg Wilhoit, DataDirect's chief architect, is one of the smartest professionals I have met. He continues to drive the Shadow line of products.

Shadow is rooted in the data integration space but has evolved over time to include programmatic integration as well.

From an architecture standpoint, Shadow is a mainframe-centric integration solution. It is probably one of the fastest and most stable products available that runs natively on the mainframe (Other such products are GT Software's Ivory, HostBridge, SOA Software SOLA and IBM's CICS Web Services support).

The major argument against mainframe centric integration solutions is usually one of cost. With the broad adoption of SOA for instance, all mainframe-centric solutions suffer from the cost of XML parsing/formatting. This is a CPU intensive activity that affects the overall mainframe cost of ownership (On mainframes, the more CPU you consume, the higher your software license fees are).

Gregg was one of the first (maybe the first as far as I know) who made it possible to reduce such CPU costs by exploiting IBM's specialty engines on system z. These co-processors can offload java and DB2 related payloads from the main processors therefore avoiding the license fees inflation. Since then, other companies have jumped in.

Although the ZIIP/zAAP offloading is good news for mainframe centric solutions, I doubt this will significantly reduce their total cost. One important aspect of integration is that it needs monitoring for instance. It is a whole lot trickier to tune mainframe centric solutions than distributed ones. It also requires very specific and costly skills and software.

When comparing distributed integration to mainframe-centric integration, it is important to compare development costs, administration costs and ultimately exit costs. I doubt that the total would be in favor of mainframe centricity.

In my opinion, the major argument for mainframe-centric solutions is one of performance and stability. The cost though will probably remain high.

Sunday, October 4, 2009

Openworld Forum

Went to the open world forum last week. This is the first time I attend and found the event quite interesting.

There were a lot more business suits than I expected and most of the topics revolved around the business aspects of open source.

Had a chance to listen to Nick Halsey from Jaspersoft, Josep Mitja from openbravo and Yves de Montcheuil from Talend. I find it much more interesting to listen to these relatively young, yet quite successful, open source companies, than to the older organizations such as Apache, Eclipse and Linux which seem out of reach.

It seems these smaller OSS companies make their money mostly by selling subscriptions. This model has been very successful for Redhat of course but I am still wondering if it is as efficient for server technologies. I thought services would be an important secondary source of revenue for them but this does not seem to be the case. The explanation could be that these companies heavily rely on SI's to get them into accounts and therefore they can't sell too much services for fear of openly competing with these same SI's. The bright side is that this forces them to focus on software engineering rather than diverting their resources on services.

The most impressive presentation I saw was from Mark Shuttleworth, the Ubuntu project founder. He had 3 slides with a single word on each of them. This is the first time I can remember 100% of a presenter slides content without taking a single note!

Friday, October 2, 2009

ANTLR and Terence Parr

I am reading Terence Parr 's The definitive ANTLR reference. I have to say that it is not often that I regret not being 25 years younger to attend Terence courses at the University of San Francisco.

Theories behind compilers are still complex but Terence is doing a great job at making things clearer for people like us, with concrete problems on their hands.

Tuesday, September 22, 2009

Mainframe hardware & OS versus Mainframe Applications

Another interesting study by Forrester has been published. Although it is a commissioned study by CA, there are some interesting figures in it, namely the 15000 mainframe sites remaining.

One company may own several mainframe sites and a mainframe site may host more than one physical machine. A mainframe site size is usually measured in MIPS, the accumulated power of all the mainframe hardware running on that site.

The total amount of MIPS is growing 20% annually according to Forrester.

BMC published similar figures some time ago.

The problem with figures coming from IBM, CA or BMC is that they all focus on MIPS, i.e. hardware.

But a mainframe today can run z/VM and several hundred z/Linux images. As such, it is essentially a virtualization environment for Linux. These Linux images probably run Java/J2EE applications of some form. No doubt that there is a need for such centralized architectures which explains part of the MIPS growth.

But how about the more traditional COBOL-CICS or PL/I-IMS applications? Do they have a share in the 20% annual growth rate?

It is probably not a good idea to look at the compound MIPS growth rate anymore since it now sustains workloads that are totally unrelated.

I would love to see a study focusing on legacy-MIPS.

First post on the LegStar blog

I have been working on open source software for the last 3 years or so. One of the most frustrating aspects to me has been the lack of feedback from users.

Most people who download the software never leave a message of any kind. Only a tiny minority do.

LegStar is an open source integration solution for mainframe applications. It deals with COBOL and CICS applications and it does so by leveraging J2EE and ESB servers such as Mule and JBoss ESB.

I have decided to start this blog hoping it will help gather more feedback.