Saturday, March 26, 2011

When it comes to Mainframes, nothing is simple

The IEEE Software review has published an article by Belgian researchers who made an attempt at re-engineering a mainframe application using automated tooling they knew worked in other environments.

Although the article mentions important lessons learned, it proved once more that it is extremely difficult to reconstruct mainframe applications knowledge automatically from static code analysis.

They typically ran into an issue, I for one, encountered several times, where an apparently autonomous set of COBOL programs actually calls some Assembler magic that turns control to other hidden programs only identified at runtime. There is little you can't do on a mainframe using Assembler. One thing you can do, is load executables dynamically.

During the 80's, the heydays of Mainframe programming, every IT shop had one or more Assembler guru's. With much looser budget controls than today, it was possible to spend considerable time developing in-house frameworks, optimizing performances and so forth. Not to say that this only had negative effects, these optimized assembler routines probably saved large amounts of money by reducing CPU consumption, a major parameter on IBM bills.

Most code analysis tools are COBOL centric. One reason for that is that COBOL is not that hard to parse. To my knowledge, there are no automatic code analysis tools for MVS Assembler (DataTek has an impressive tool for Assembler to COBOL but it usually requires some level of human assistance). That's probably because the level of complexity such a tool would require would be several levels of magnitude higher than COBOL.

For a typical Mainframe shop, the volume of MVS Assembler programs is much smaller than COBOL. That might explain why vendors would have a hard time cost justifying writing a very complex Assembler code analysis tool.

So the result of such a situation is that COBOL code analysis tools fail on multilanguage boundaries. Of course there are other multilanguage boundaries on Mainframes for instance: COBOL to BMS or MFS macros, COBOL to JCL, COBOL to SQL, COBOL to CICS, COBOL to IMS not to mention all third party tools one can find on Mainframes. One thing for sure, I have never seen a pure COBOL application.

The problem is that without reliable code analysis tools, you can't reconstruct knowledge in a bottom up approach. What you end up doing is taking the top down approach, by chasing the last application expert on site, hoping that he hasn't retired yet.

The complexity revealed by this IEEE article explains a good part of the Mainframe applications longevity. I know IBM prefers the reliability, availability, security explanation. But I know many IT shops who would have happily migrated off their mainframes if it was easy.

LegStar is also affected by this complexity of course. I often have a hard time explaining to Java developers why the Java side of LegStar works so easily while the Mainframe side, which has much less code in it, is often much harder to get to work properly...