Monday, June 18, 2012

Comparing Protocol Buffers to XML and JSON payloads

Following up on my last blog entry, I was interested in getting some figures comparing payload sizes for data encoded using protocol buffers against XML and JSON.

Since the major argument in favor of protocol buffers is the reduced network impact, I thought it would be interesting to measure that in a mainframe integration environment.

I built my test scenario with a CICS program called LSFILEAQ. LSFILEAQ is part of the legstar test programs and uses a VSAM file that comes with an IBM demo application commonly found in CICS development partitions.

LSFILEAQ takes simple query parameters on input and sends back an array of replies. This is the COBOL structure that describes the LSFILEAQ commarea (its CICS communication area):

        01 DFHCOMMAREA.
           05 QUERY-DATA.
              10 CUSTOMER-NAME               PIC X(20).
              10 MAX-REPLIES                 PIC S9(4) COMP VALUE -1.
                  88 UNLIMITED     VALUE -1.
           05 REPLY-DATA.
              10 REPLY-COUNT                 PIC 9(8) COMP-3.
              10 CUSTOMER OCCURS 1 TO 100 DEPENDING ON REPLY-COUNT.
                  15 CUSTOMER-ID             PIC 9(6).
                  15 PERSONAL-DATA.
                     20 CUSTOMER-NAME        PIC X(20).
                     20 CUSTOMER-ADDRESS     PIC X(20).
                     20 CUSTOMER-PHONE       PIC X(8).
                  15 LAST-TRANS-DATE         PIC X(8).
                  15 FILLER REDEFINES LAST-TRANS-DATE.
                     20 LAST-TRANS-DAY       PIC X(2).
                     20 FILLER               PIC X.
                     20 LAST-TRANS-MONTH     PIC X(2).
                     20 FILLER               PIC X.
                     20 LAST-TRANS-YEAR      PIC X(2).
                  15 LAST-TRANS-AMOUNT       PIC $9999.99.
                  15 LAST-TRANS-COMMENT      PIC X(9).

I ran the program using a query yielding a result of 5 customers and measured the raw size of the commarea in bytes. I came up with 422 bytes. Note that this includes both the input and output parameters. These are untranslated bytes, in z/OS format.

Using the legstar transformers I than generated XML and JSON representations of that same z/OS data. This time, the data is translated to ASCII and formatted using XML or JSON. I than measured the sizes of the corresponding XML and JSON payloads and got:

XML:1960 bytes
JSON:1275 bytes

What these results mean is that if I choose to send XML to CICS instead of the raw z/OS data, I will increase the network load by a factor of 364% (almost 5 times to raw data payload).

If I select the less greedy JSON to encode the payload, the network load increases by 202% (3 times the raw payload).

Now, what would be the equivalent protocol buffers payload?

To answer that, I first wrote a protocol buffer "proto" file using the protocol's Interface Description Language:

package customers;
option java_package = "com.example.customers";
option java_outer_classname = "CustomersProtos";

message CustomersQuery {
  required string customer_name_pattern = 1;
  optional int32 max_replies = 2;
}

message CustomersQueryReply {
  repeated Customer customers = 1;

  message Customer {
    required int32 customer_id = 1;
    required PersonalData personal_data= 2;
    optional TransactionDate last_transaction_date = 3;
    optional double last_transaction_amount = 4;
    optional string last_transaction_comment = 5;

    message PersonalData {
      required string customer_name = 1;
      required string customer_address = 2;
      required string customer_phone = 3;
    }

    message TransactionDate {
      required int32 transaction_year = 1;
      required int32 transaction_month = 2;
      required int32 transaction_day = 3;
    }
  }
}

This is close to the target LSFILEAQ commarea structure.

I then used protobuf-cobol to generate COBOL parsers and writers for each of the protocol buffers messages.

Rather than using the command line generation utility though, I used simple java code that looks like this:

HasMaxSize maxSizeProvider = new HasMaxSize() {

    public Integer getMaxSize(String fieldName, Type fieldType) {
        if (fieldName.equals("customer_name_pattern")) {
            return 20;
        } else if (fieldName.equals("customer_name")) {
            return 20;
        } else if (fieldName.equals("customer_address")) {
            return 20;
        } else if (fieldName.equals("customer_phone")) {
            return 8;
        } else if (fieldName.equals("last_transaction_comment")) {
            return 9;
        }
        return null;
    }

    public Integer getMaxOccurs(String fieldName,
                                JavaType fieldType) {
        if (fieldName.equals("Customer")) {
            return 1000;
        }
        return null;
    }

};
new ProtoCobol()
  .setOutputDir(new File("target/generated-test-sources/cobol"))
  .setQualifiedClassName("com.example.customers.CustomersProtos")
  .addSizeProvider(maxSizeProvider)
  .run();

The "HasMaxSize" provider allows the generated COBOL code to implement the size limitations which are specific to COBOL.

Besides the parsers and writers, protobuf-cobol also generates copybooks for the various messages. This is what we get for the input and output messages:

01  CustomersQuery.
           03  customer-name-pattern    PIC X(20) DISPLAY.
           03  max-replies PIC S9(9) COMP-5.

       01  CustomersQueryReply.
           03  OCCURS-COUNTERS--C.
             05  Customer--C PIC 9(9) COMP-5.
           03  Customer OCCURS 0 TO 1000 DEPENDING ON Customer--C.
             05  customer-id PIC S9(9) COMP-5.
             05  PersonalData.
               07  customer-name PIC X(20) DISPLAY.
               07  customer-address PIC X(20) DISPLAY.
               07  customer-phone PIC X(8) DISPLAY.
             05  TransactionDate.
               07  transaction-year PIC S9(9) COMP-5.
               07  transaction-month PIC S9(9) COMP-5.
               07  transaction-day PIC S9(9) COMP-5.
             05  last-transaction-amount COMP-2.
             05  last-transaction-comment PIC X(9) DISPLAY.

It is not exactly the same as the original commarea but is pretty close.

Using the protobuf-cobol parsers and writers and the same input and output data as for XML and JSON, I than obtained a protocol buffers payload size of 403 bytes (sum of input and output payloads).

Protocol buffers payload is even smaller than the raw z/OS data!

That might sound surprising but is the result of the extremely efficient way protocol buffers encodes data.

As an example, the COBOL group item QUERY-DATA size is 22 bytes on the mainframe. In my testing it contains hexadecimal:

"e25c4040404040404040404040404040404040400005".

The equivalent protocol buffer payload though is only 6 bytes long and contains hexadecimal:

"0a02532a1005".
This confirms that protocol buffers brings important benefits in terms of network traffic. For distributed applications and under heavy load, this is bound to make a big difference with XML and JSON based systems.

3 comments:

  1. Excellent post, thanks! Just what I was looking for.

    ReplyDelete
  2. Thanks, it very helpful.
    Can you please publish little more data on performance, I planning to choose either protocol-buffer vs JSON.

    ReplyDelete