I decided to try Java EE Batch, then try Spring Batch and finally Spring For Hadoop with Pig. Java EE Batch is relatively new specification but for a number of use cases it will be sufficient. This sentence says it all – I thing that Java EE Batch is ok. Nothing more yet. As you will see you have to do a lot things yourself and doing other things is just cumbersome.

So what is is Java EE Batch, a.k.a. JSR 352 (Batch Processing for Java Platform) – it is heavily inspired by Spring Batch, Java EE family specification describing framework for batch job processing. It allows to describe jobs in Job Specification Language (JSL) – only XML format is supported. I guess that for batch apps this may be ok. I can live with it ūüėČ

Java EE Batch specified two types of processing:

  • Chunked – here we have 3 phases – reading, processing and writing. Chunked steps can be restarted from some Checkpoint.
  • Batchlet – this is a free form step – do whatever is required.

Java EE Batch has many features like:

  • Routing processing using decision steps
  • Grouping groups of steps into flows
  • Partitioned steps –¬†step instances to be executed in parallel. Partitioned step allows to split job into multiple threads in more technical way. We can specify partition mapper, partition reducer and other partition specific elements. I won’t go into detail on this now.
  • Splits –¬†splits specify flows to be executed in parallel – there can be a few flow definitions. so this is more process or business oriented parallelization¬†of work.
  • Exception handling:
    • Skipping steps
    • Retying steps
  • JobContext to pass batch job data to steps

CDI is used and supported. But with all of these specifying job context parameters is inconvenient (in 2005 I would say that it’s ok ;)), we must do a lot our selves – like moving files, reading them and passing intermediate results.¬†It would be nice to have some utilities to help with standard batch related chores.

Ok let’s see some example – below you can fine a JSL definition for a demo batch job. You can build this XML using Eclipse xml editor with ease (or use some plugins for NetBeans) – just choose XML from schema option and then select XML Catalog and jobXML_1_0.xsd entry:

xml_schema_jpl

First we move file from import location to some temporary location so it can be processed without any problem:

next=”resource-router” specified next step to be invoked – there is a possibility for a typo here. So inconvenience.

<jsl:batchlet ref=”initializeResourceImport” /> – specifies a batchlet (free form job) step that will move file. Instance of class for this step will be instantiated as a CDI bean named “initializeResourceImport”:

Here we move file to a workDir location.

Next we start a flow and execute another step type – chunked step. Next step id is given in next attribute.

Each of phases is implemented by a CDI Bean. So reading looks like this:

Open and close stream in this step, w provide checkpoint definition in order to restart from some point. Single invocation of this reader is only reading single line that is passed down to process part of step. It would be easier if Batch framework provided some reading utilities and let us worry about doing important stuff with data than to worry about opening and proper closing of files. But this foundation functionality that is provided now is also required and features that make using Batch Processing for Java Platform will be added in next versions of spec.

Next phase is processing line of data read from file Рonly thing that is done here is conversion from String to KnowledgeResource instance

And finally we can write the data

This is also simple processing in order to have some fun with Java EE Batch framwork without worrying about more job related details. Important parts here is that this phases receives a list of objects that are results of processing phase.

Use of application scoped CDI bean of class¬†KnowledgeResources – this is done this way in order to ease testing. Of course in real life batch job I would not think about using application scoped bean without any special handling – it’s like having state field in a Servlet where requests store data ūüėČ Very bad idea.¬†This job passes a result of processing that will be used by Decider instance to route processing. This simplish demo uses a hard coded value.

After processing data in resource-processor step we proceed to decision step Рpath-decision. This step is also implemented by CDI Bean named pathDecider.

Depending on what value is returned from Decider instance some route will be chosen. Steps that are to be invoked next are designated by id in to attribute. Java Based config would allow here to get rid of some typos.

Again implementation is simplish:

Decider chooses on of two steps (ok, it does choose STORE step every time…):

CDI beans for both steps are similar:

Important note about running test Рwe have to use Statless EJB to start batch because of class loader ordering РArquillian does not see Java EE Batch resources:

So we use BatchImportService to start batch. In real app we would probably use a Claim check like processing in order to pass data to an SFTP server and trigger processing using JMS message or Web Service call. We could also monitor SFTP using some service (Oracle Service Bus for example can monitor SFTP and start processing).

And this is it Рa simple evaluation of Java EE Batch (or Batch Processing for Java Platform) basic features. Next thing would be to add partitioned or split features. In the mean time Рcode is on github.

Apart from batch applications we could use stream or event processing. Chosen architecture depends on requirements for system under construction. If order is important but processing delay is not so important than batch processing is worth to check. If we need to react more quickly, or we want to continue processing data despite processing of some part of this data failed and we want to have high reliability than using messaging may be way to go. But all this is a subject for another article ūüôā

Have a nice day and a lot of fun with Java EE !