Asynchronous processing of RESTful web services requests in Spring and Java EE worlds wrapped in Docker ‚Äď part 2: JAX-RS

14th June 2016 Uncategorised No comments

A little while passed, many great things have happened ūüôā

Back to Java EE world, when talking about cloud, containers, docker we should consider Wildfly Swarm. This JBoss’ child is a modularization of Java EE way beyond standard Java EE profiles (web, enterprise etc.). Main features of Swarm are:

  • Capability to construct application as a fat jar
  • Application’s fat jar will contain only required Java EE specifications (with implementation part of course) and framworks¬†(e.g. Logstash)
  • Capability to add necessary dependencies based on structure of build automation tool¬†descriptor (maven, gradle)
  • Capability to construct customized deployments (via Shrinkwrap)

Some of these features make Wildfly Swarm a little similar to Spring Boot, of course there are differences as configuration and deployment of Java EE application is different that that of Spring application. Let’s consider configuration – in Spring you can use java configuration or xml (if you like), every configuration is similar when this is considered. In Java EE there are different types and formats of configuration. EJBs, Serlvets mostly don’t need XML nowadays, but Java EE Batch – here you are stuck with XML. When using Wildfly Swarm there is a possibility to do some additional configuration that will augment Java EE configuration, but I would not bet that API of Swarm or Shrinkwrap will not change.

Maybe¬†Spring Boot ¬†can do more automatic configuration compared to Wildfly Swarm. This lack of belief is dictated by the way Java EE and Spring are configured. If we consider CDI we are not forced to add any configuration (starting from Java EE 7 even the beans.xml descriptor is not required). Looking at the JPA we can add datasources (Swarm specific part) and define persistence units (Java EE part) . So we can configure the application and how it is deployed, Spring Boot’s flexibility comes with Spring not the other way around in my opinion.

Enough of theoretical considerations. Let’s code. Spring Boot has excellent – Wildfly Swarm has Wildfly Swarm Project Generator. Name is maybe not so catchy as Spring’s but Swarm’s web app does basically the same thing and it is really smart to use it and not do the basic configuration your self. For demo app we need just the¬†JAX-RS with CDI dependency. After downloading application’s zip we need to move it to where we want to and import to IDE. Below is pom file that I enhanced with Swarm’s plugin configuration, there rest is generated:

Important parts:

  • We import Wildfly swarm BOM to ensure matching versions are used. This is similar to Spring IO Platform BOM
  • We use WAR packaging – more on this below
  • We can configure debug port and we do configure bind address

Packaging set in maven impacts your application packaging and not the fat jar packaging. With Wildfly Swarm we always get packaged application and Wildfly Swarm generated fat jar. If we would use jar packaging then we would need to configure custom deployment using Shrinkwrap like in this example. Despite what is stated in Wildfly Swarm docs I could not run demo application using jar packaging. So AFAIK if you use jar packaging you are forced to write you own main method that deploys you app.

What we did so far is we created application that deploys JAX-RS with CDI and other required dependencies and only these. No EJBs, no JMS, no JSP. Great ! But if you peek into WAR’s lib there are a some libraries that could be excluded like Jackson JAXB module.¬†Probably dependency meta data should be stated more precisely in order to exlude some of these. But some, like websockets lib, could be excluded. Java API for Websockets is separate specification and could be specified as a separate Widlfly Swarm dependency.

Passing on to Java Code, we need to create JAX-RS application class:

And then JAX-RS service:

We have only one way to execute Asycnhronous processing in JAX-RS, and this way is to use AsyncResponse that can be injected using @Suspended qualifier. AsyncResponse allows us to resume processing when response is ready, just like Servlet 3.x asynchronous API. Optionally we can specify timeout and timeout handler that will be executed when processing times out.

Example code contains also synchronous version of method. Possible upgrades include use of managed thread pools in Swarm instead of creating our own thread pool.

I had some trouble with VirtualBox VM with Docker so the app is not tested with Docker yet. I will do it in a few days. Code is available on GitHub, I will update it in a few days. In order to run application wildfly-swarm:run goal needs to be executed after project is build. I tried using wildfly-swarm:start goal but Wildfly was killed right after build was completed. Dockerized Wildfly example is available on Marcus Eisele blog.

Another way is to execute swarm-demo-swarm.jar from target directory, just execute:

java -jar swarm-demo-swarm.jar

Once Wildfly is up and running service can be accessed under http://localhost:8080/swarm/task-service/process-task?task-name=YOUR_TASK_NAME&task-param=parameter

To be continued… ūüôā

Preventing user from typing invalid characters in AngularJS (no jQuery)

3rd May 2016 AngularJS, Web development No comments

This is actually pretty simple when done using custom directive, main design detail is to use ngModelController. ngModelController provides us with a few interesting possibilities:

  • getting view value ($viewValue property)
  • getting model value ($modelValue property)
  • setting view¬†value ($setViewValue function)
  • access to $parsers and $formatters
  • access to $validators
  • access to field’s ngModel state properties like $touched, $pristine etc.

$validators are useful to do validation of user input and marking field as invalid. But if we want to prevent user from typing some character that we consider invalid we should user $parsers. A function registered as a parser will be invoked every time a model value is to be updated. Registered functions are invoked in order or registration and output from previous function is input to the next one – just like in Chain of Responsibility design pattern. This gives us opportunity to check value that is to be set in the model and modify it. Let’s take a look at an example:

Directive comValidate requires ngModelController. Once ngModelController is injected to directive’s link function we can¬†register a parser function which will do required tests of value provided by the user. In above directive a test is done to check if value is a number. If it is then newValue is passed on to next $parser function. If not, then:

  • view value is updated and rendered – it is important to call $render otherwise user would see invalid character from time to time (although it would not be present in the model)
  • current model value is passed to the next $parser function instead of newValue. This is also important – as if we return e.g. undefined input will be cleared.

Example code on plunkr.

Have a nice day !

Asynchronous processing of RESTful web services requests in Spring and Java EE worlds wrapped in Docker – part 1: Spring

25th April 2016 Cloud, Docker, Spring No comments

Both Spring and Java EE provide possibility of processing requests send to RESTful web services asynchronously. I’ve read some good comparisons of Spring vs. JAX-RS REST API’s, some quite are quite old now. Regarding Spring I can recommend an article by Magnus Larsson. In this article Magnus used Spring Boot to develop a RESTful web service (or a REST service if you prefer this name ;)).

I wanted to build an example of dockerized Spring REST service using Spring Boot and compare it to what Java EE can give us. In this post I will focus on Spring and in the next post I will develop similar service in JAX-RS to show similarities and differences. In order for the architecture to be cohesive I will not use Spring Boot then, instead I will use Wildfly Swarm – a Java EE solution that allows to build your own custom container for Java EE based services. To be honest nowadays I prefer spring when it comes to flexibility, but Java EE is really getting closer in the mainstream technologies – I mean messaging, batch, components, web frameworks (hey ! there will be a Java EE MVC Framework !).

If you would be curious about WildFly swarm tak a look at Markus Eisele article or great Mastertheboss website.

Why async ? Because it’s more fun ūüôā and in some cases it is viable to use asynchronous service, especially that most JavaScipt API’s uses nowadays some kind of reactive or reactive-like API – like Q in Angular JS. We live in an async world.

So let’s get down to work. We will create a simple service that will accept request and then process it asynchronously allowing http connector thread to service some other request. The processing is well described in Spring Reference Docs – there is no point in repeating it here. The important things are:

  • There will be some performance penalty due to releasing thread and then re-attaching a thread to return data to client
  • We are talking about connector thread, not application thread – the latter is under your control in case of Spring (well in Java EE in most cases also)
  • Scalability will be better in cases where http threads just waited for application threads, so when application took some non-negligible time to process request
  • Fast processing requests may benefit also – if there are more http threads free to service requests more fast requests can be processed as¬†web server’s or application server’s thread pool is most likely used to service not only requests (it is like this in WildFly, Weblogic, Tomcat)

First thing we need to do is to create a Spring Boot application – we will use @SpringBootApplication as this meta-annotation specifies @EnableAutoConfiguration, @ComponentScan and @Configuration, we will also @EnableAsync so that one of our components can provide asynchronous method (this is not the same as async controller method):

Our service will invoke an EIS businnes delegate components Рin real app this could be a connector to ESB, a core banking system or something like this.

In order to better see the asynchronous processing order we log some information, and simulate that processing takes 5 seconds. Notice that first process method is @Async, so it will be invoked in separate thread.

A simple DTO for task data:

And finally the REST service:

First method (or operation in web services nomenclature) returns a DeferredResult, and as state in Spring docs, where an instance of DeferredResult is returned asynchronous request processing kicks in – response processing is suspended, thread is released but http connections is kept open ready for actual response to be send when it is ready. Processing is restarted when setResult or setError result is invoked on DeferredResult instance. We can also register timeout and completion callbacks on DeferredResult instance.

In case of EISBusinessDelegate line below is setting result:

We use @Async to simulate some long processing in separate thread and to show that DeferredResult can be passed to a separate layer and set response there.

This is the biggest difference compared to next method that returns Callable. Here result is returned by RESTful controller. The third method add possibility of registering timeout and completion callback.

Exception processing in basically the same as form synchronous controller methods.

In order to test our services we can use Spring’s REST Templates (full code available on github):


Spring boot role in all these is of course to simplify processing and make application more cloud friendly. We can build fat jar, and Spring Boot takes care of things such as marking DispatcherServlet as async capable (<asycn-supported> – see¬†Configuring Asynchronous Request Processing in Spring ref docs). Let’s go one step further especially that it is very easy to dockerize this app using Spotify Docker Maven plugin, important parts of pom.xml are:

docker.image.prefix will be used as image prefix used to identify this image. See imageName tag below – second part of it will the project’s artifact id

Important detail – docker must run under root or sudo, but if you want to build docker image with maven plugin then Eclipse must also run under sudo or root. I haven’t tried Intellij Idea yet… I really should.

To build a docker image we must write a Docker file like the one below

FROM java:8 – tells docker to take an image that we will build upon.
VOLUME /tmp – instructs docker to create /tmp directory (it will be used by e.g. embedded Tomcat)
ADD asynch-service-1.0.0-SNAPSHOT.jar app.jar – adds our application from the local filesystem to image
EXPOSE 8080 – we expose port 8080
RUN sh -c ‘touch /app.jar’ – touch will ensure that file modification timestamp¬†is ok
ENTRYPOINT [“java”,””,”-jar”,”/app.jar”] – we run our app (spring boot will take care of server etc.)

After docker image is build we can run docker image using command like the one below (using sudo or root):

sudo docker run ‚Äďp 8080:8080 –name contaier_name¬†spiralarchitect/asynch-service

Code on GitHub,

Hav en dejlig dag !

Ps. simplish test client has localhost:8080 hardcoded – be aware of this.

Variants of SEDA

17th April 2016 Architecture, EDA, JMS, Uncategorised No comments

SEDA (Staged Event Driven Architecture) was described in Ph.D. thesis of Matt Welsh. I recently read his blog post that describes various ways SEDA was perceived and implemented . One thing that comes to my mind after reading this is that SEDA as a tool has been used in more or less correct ways, as many other patterns and architecture models.

Because SEDA is a design¬†model and architecture pattern, one that can be used with other models like SOA. In fact some ESB’a use SEDA as a processing model, like Mule ESB¬†or ServiceMix. SEDA is one of a few processing models in ServiceMix. But SEDA can be used elsewhere, like with Oracle Service Bus. And in more that one way:

  1. SEDA using messaging – here we would add a queue before service that consumes event messages possibly using SOAP/JMS binding. We can that add additional stages for events processed by this first service.
  2. Throttling capabilities of OSB – here we do not have a real thread pool but using WebLogic’s Work Managers we achieve similar goal, threads will be taken from WebLogic thread pool. This processing model will work with every routing action

I also used a variant of the first model, where services were exposed as SOAP/HTTP services and they published a messages on a queue and then send reply that message has been queued. This way we can still used SOAP/HTTP messages, not forcing other systems to use messaging – either JMS, AMQP, STOMP or some other.

When done right SEDA will improve system’s¬†scalability an extensibility.¬†Scalability is higher because if a peak of messages happens than it will be queued up to the limit of storage quota. Messages that can’t be processed when arriving at a queue will wait for their turn. With addition of reliable messaging like JMS persistent delivery with exactly-once guarantee we can have scalable and reliable message processing mechanism.¬†Adding additional queues depends on requirements of specif domain or application. It may be viable to add queues not only because of scalability but also from extensibility point of view. We would use topics here of publish event to multiple queues but the main point is that it would me possible and quite easy to add additional event consumer in this model.

SEDA may be also good for asynchronous request processing with long processing times with addition of NIO2 and Servlet 3.x asynchronous processing model. In this case we would accept request at some endpoint like Spring controller method and then invoke asynchronous method to do backend processing. HTTP could process another request for another potentially synchronous endpoint. Backend processing service would process incoming requests and invoke completion handlers as it finishes (using Spring infrastructure).  Here we would have at least two queues РHTTP thread queue and backend processing service queue. Both could be managed and adjusted to add higher scalability.

When considering SEDA it is important co remember that request processing will be lower that with processing with one thread and no queues on the way. This is quite obvious, as dispatching, consuming, confirming event message has it’s performance impact. SEDA can also impact system’s maintainability as event messages may get stuck in some queues or get redirected to dead letter queues and this must be monitored. Fortunately it is not a big problems, especially on OSB (we can use alerts) .

SEDA can be used incorrectly, just like EDA, CEP or SOA. ¬†Once again architecture design is probably¬†the most important thing in system’s development.

Have a nice day !

EDA in Java EE 7

3rd April 2016 EDA, Java EE, JMS, Software development 1 comment

Building EDA in Java EE 7 or Spring is easy. Let’s create a simple EDA routing example using JMS 2.0, EJB 3.2 and CDI 2.0. We will focus on messaging part, so we will not use anything fancy to route messages. We will publish messages to a single queue – the event channel. Messages will be received by EDA component which will route them according to header. Message will be published to appropriate event queue based on header value and will be received by appropriate component that will monitor those queues. Example will use queues but it can be easily modified to use topics.

A few important things:

  • What is important is to keep the architecture clean an event generator must not know anything about event consumers. Event instance should be created using an EventBuilder or some other helper class so that event source passes required information and does not know anything about details of EDA event construction.
  • Event consumer that does business processing must get business information from event and trigger processing. Do not implement business code in event consumer nor propagate EDA related details down to domain services layer.
  • It is important to do routing based on header if possible, as it is more lightweight.
  • Use appropriate session mode (transacted or no, with required AcknowledgementMode)
  • Remember that messages can and will arrive out of order
  • Use one or a few Dead Letter Queues, sent messages that failed to be processed a few time to related DLQ. Example code does not do this

EDA - simple routing


Publishing message using EJB is super easy:

Important parts are discussed below.

Creating JMSContext that manages JMS Session and ConnectionFactory for us. This must NOT be created in session transacted mode in order for it to join JTA transaction. There are cases where we want to manage transactions our selves or send message whether or not other parts of request processing failed. In those cases we could use separate transaction (e.g. by setting TransactioAttribute to REQUIRES_NEW  in CMT model or JTA 1.2 Transactional annotation) or we could create a transacted session and commit local JMS transaction in publisher code. In most cases we want to publish events as part of some other processing in transaction, so we must join existing JTA transaction.

Thanks to AUTO_ACKNOWLEDGE mode session will not be transacted and will join existing JTA transaction. We can use this to span XA transactions if it would be required.

Creating message and setting routing header:

and finally sending message:

Event can be send fully asynchronously in Java EE. In code example above client code will block until JMS provider acknowledges message. Starting from Java EE 7 we can send message and continue processing.

Publishing a message from a CDI Bean is similar, notable detail is the use of @Transactional, a JTA 1.2 annotation that allows CDI beans to work in transaction context (but not in case of life cycle methods – here EJBs still have to be used).

Messages will be received by¬†Message Driven Bean. One of the strengths of using MDB component is that messages are delivered asynchronously with possibility to scale MDB resource share (thread pool, component instance pool). EJB container take care of other thing that we must think of when using CDI bean like transaction management and related message acknowledgement. A lot of configuration can be done declaratively¬†using¬†@AnnotationConfigProperties. But I don’t intend to compare CDI with EJB here, so let’s see the code:

MDB implements MessageListener to clearly state it is a JMS MDB, as MDBs can also connect to other resource manager types. onMessage method will be invoked to process each message and each message will be processed separately, even though messages will be fetched in batches from JMS Provider. Default acknowledgment mode for MDB component is AUTO_ACKNOWLEDGE with DUPS_OK_ACKNOWLEDGE allowed to be set.

SimpleRoutingEventMessageBroker can be source of other events. How you process events is system specific. In some cases simple XML or JSON routing specifications will be sufficient, in other cases Business Rules engine will help go trigger events. CEP (Complex Event Processing) is different league.

As comment states in code above MDB component that gets messages from event channel should not take care of message processing or routing. Message processing should be executed in dedicated component so that we can use different routing strategies and component will be easier to maintain. Testing routing code separately of JMS can also be a plus – although we can test it with JMS this test will be slower than regular JUnit tests.

Event consumer implementation is similar to SimpleRoutingEventMessageBroker MDB:

Message consumer CDI component is another story – see code below:

We must receive messages manually and we can and should give a timeout for receive operation. This component is transactional also, but it must be called by some other component in order to receive messages.

We can test code above using Arquillian. In case of both publishers we will test if message is not send if an exception is thrown.

Second test is similar, so if you’re interested to check github.

In come cases it may make more sense to use lightweight integration framework like Spring Integration or Apache Camel to create internal EDA component. Most efficient tool should be used to do the job.

Have a nice day !

Is Reactive similar to Event Driven Architecture ?

21st March 2016 Architecture, Reactive, Software development No comments

Reactive is getting more and more popular. I decided to check it out, started to do some research. A thought came into my mind Рis Reactive related to EDA ? If you check Reactive Manifesto then quick answer would be yes, it is. But it is more complicated than than.

What is Reactive Programming ? According to the manifesto it is a way of building systems so that they have certain characteristics:

  • responsiveness – they respond in timely manner
  • resilience – they are fault tolerant and highly available
  • elastic – I must cite Manifesto here ūüôā “The system stays responsive under varying workload.” You’ll see why
  • message driven – reactive systems use asynchronous messages to propagate information and achieve loose coupling

Some points in the manifesto are more detailed, some less – but it is a manifesto, not a tech spec. So don’t expect too much. Manifesto tells about an approach to design and development¬†of systems though. But it is not an architecture, nor a design pattern. Maybe this Manifesto level of detail is the reason that most of the articles or presentations I came across was about Reactive approach to programming and did not touch architecture design.

Important fact about Reactive Manifesto is that it sets goals but only gives some advice in case of resilience and message orientation. I can’t help to thing about it as a… Manifesto… ūüôā I’ll get back to Reactive in a minute.

EDA (Event Driven Architecture) is well known, proven architecture used to decouple system’s modules, provide better scalability. EDA architecture is achieved when components communicate using events that are carried in messages. Important factor is that event generator, that is component that emits event, does not know anything about consumers. EDA is extremely loosely coupled.

Most common way to propagate events is to use messaging system, but EDA can be done in various ways depending on the system being developed:

  • Single instance (JVM in case of Java) processing can be implemented using CDI events, simple observer pattern implementation, asynchronous invocations.
  • Event processing with¬†several application instances – here we would probably use some¬†messaging technology (MOM), but we are not limited to this:
    • remote method invocations would also do
    • one-way soap web services would do
    • restful web services with asynchronous invocations

Using messaging technology like JMS (HornetQ, ActiveMQ, SonicMQ among others) or AMQP (RabbitMQ, Apache Qpid) will give us a few important benefits:

  • Reliability: both JMS and AMQP provide a few message production and consumption modes. JMS provides persistent and non-persistent delivery, durable subscriptions (subscribers will receive messages even if the weren’t listening at the time of publication), a few ackowledgement modes (allowing or not duplicate messages, exactly once guarantee is available with JTA transacted session and XA may be required).
  • Great scalability: messages can be queued and processed without being overwhelmed even in case of peak of messages a few time greater than normal processing. Message processing components can be scaled independently from other – e.g. number of threads for a given MDB component can be increased, allowing events to be consumed faster.
  • Fault tolerance: both JMS and AMQP can survive message broker crashes

EDA can have one of a few implementation types:

  • simple event processing – like in case of observer pattern implementation, some event is generated by event generator and consumed by observers
  • streaming event processing – here events are routed and processed and can be the source of other events
  • complex event processing – in this case not only current event is analyzed but also past events are taken into analysis scope, with some sophisticated event stream queries (like in Oracle CEP).

So as we can see EDA can be:

  • resilient – if done using appropriate tools that will guranteee delivery, fault tollerance, high availability
  • elastic – this is why I wanted to cite manifesto. If you do EDA using messaging system than you get elasticity. System can and will cope with peaks in events, the messages will get queued possibly spread across cluster of messagign system’s nodes. Even more – using SEDA architecture we can throtle and dynamically control throughput
  • using messaging – we can use messaging and most of the time we do
  • responsive – due to asynchronous nature of EDA system will be responsive. It will require a different style of programming though.

As you can see there are similarities between EDA and Reactive. But as I mentioned before most of the time Reactive is about how to implement details of code and does not touch architecture level. On the other hand EDA is all about architecture – it tells about components and way to connect them so they can interact.

Reactive is more about how to structure your code, get rid of some flow statements, replace pull / imperative style with push streams. Like Introduction to Reactive Programming says Reactive programming is programming with asynchronous data streams. Another nice introduction is Jonathan Worthington presentation. We must differentiate between Reactive Programming and Reactive approach to architecture.  The second one is less common.

This lets me think that EDA can be Reactive and most of the time Reactive is just more about coding your solution using streams, observables or promises.

And btw. doing EDA in simple or streaming strategy¬†is extremely easy in Java ūüėČ

Have a nice day !



Batch processing for Java Platform – partitions and error handling

13th March 2016 Batch, Software development No comments

In previous post I wrote about basic Batch processing for Java Platform capabilities. I tried out partitions and error handling now. Let’s look at error handling first. With chunk style steps have following options:

  • Skipping chunk¬†step when a skippable exception is thrown.
  • Retry a batch step when a retryable exception is thrown. Transaction for current step will be rolled back unless exception is also configured as no-rollback type exception
  • No rollback for given exception class

Processing applies for exceptions thrown from all phases (read, process, write) and checkpoint commit. We can specify what exception classes are to be included in each option – that is, if given exception class is skippable exception. We can also exclude some exception classes and batch processing engine will use nearest class algorithm to choose appropriate strategy to use. If we include exception class A as skippable and this class has two subclasses B and C, and we configure exclude rule for class C then exceptions A and B will cause engine to skip step while exception C will not cause the engine to skip step excecution. If no other strategy is configured then job execution will be marked as FAILED for C.

Let’s see an example

Here we skip step for pl.spiralarchitect.kplan.batch.InvalidResourceFormatException exception. We add code that throws this exception to factory method for KnowledgeResource:

In test file we make one line invalid:

Step for this line will be skipped.

For Batchlet style steps we need to handle exceptions our selves. If exception gets out of step job will be marked as failed.

Another more advanced feature is partitioning of steps. Partitioning is available for both chunk as well as batchlet style steps. Consider example xml below:

In this configuration we specify that there are to be two partitions and two threads are to be used to process them, so one thread for a partition. This configuration can be also specified using a partition mapper, as the comment in xml configuration snippet describes.

Partition collector’s role is to gather data from each partition to analyzer. There is a separate collector for each thread.

Partition analyzer is to collect data from all partitions and it runs on main thread.  It can also decide on batch status value to be returned.

In order to understand how this works it may be helpful to look at algorithm descriptions in JSR Spec, chapter 11. Important detail that is described here is that for each partition step intermediary data is stored in StepContext. With this knowledge we can create a super simple collector – keeping in mind that we could process the intermediary result here, we just don’ t need to do this:

This collector will be executed for each step with result returned by writer step Рyou can  find modified KnowledgeResourceWriter below

Then we can write code for super simple analyzer:

KnowledgeResourcePartitionAnalyzer adds all resources to same list that writer step did. It also sets parameters used by next step, that were previously set in writer.

Now when we execute modified example we will see that we have some strange output:

Yikes !

1. We did not tell each partition what data are to be analyzed by it nor we did contain this logic (to decide what data should be processed) in step definition itself.

2. storing resource: KnowledgeResource [title=Hadoop 2 and Hive, published= 2015] is repeated four times ! Рthis is because:

“The collector is invoked at the conclusion of each checkpoint for chunking type steps and again at the end of partition; it is invoked once at the end of partition for batchlet type steps.”

Back to the lab then

To fix first problem we simply split the file in first step into two files – one for each partition. We will create two files statically (artciles1.txt and articles2.txt). Each partition will read one file – file name will be configured (simplification for demo – in real life application this would be probably a bit more complicated). So we need to change implementation for reading files and configuration.

Important lines in configuration above are:

*¬†<jsl:property name=”fileName” value=”articles1.txt”/> and¬†<jsl:property name=”fileName” value=”articles2.txt”/> – here we configure file name for each partition

*¬†<jsl:property name=”fileName” value=”#{partitionPlan[‘fileName’]}”/> – here we configure file name for reader component, this needs to be done to transfer configured file name from partition plan to step properties (inconvenience ; ) – JSR-352 is young).

Reader component will get fileName injected (cool feature of young JSR)

And it will use this file name to construct resource path:

This fixes first problem.

To fix second problem we need to know if this is end of partition. We can check it in a few ways :

* check if we processed given number of elements, so there would be a need for some service that would monitor all partition processing

* check thread name – partitions are executed on separate thread, poor solution as threads may be pooled so this won’t work

* check if we already processed given element (using some hash)

I haven’t found any API that would tell if partition execution is done and we this is the last call to collector. Maybe this last call is an error in JBeret (JBoss’ implementation of Batch processing for Java platform).

We will try last solution- we could check if we processed chunk in analyzer, but since this is more technical detail (the important thing was to find out why we got duplicated element) we will just check this in data structure for KnowledgeResources – we will simply replace List with Set:

After this changes we get correct number of processed entries ūüôā

Take a look at this post on Roberto Cortez blog regarding Java EE 7 batch processing.

Have a nice day ! Code is on github.


Trying out Java EE Batch

7th March 2016 Batch, Java EE, Software development No comments

I decided to try Java EE Batch, then try Spring Batch and finally Spring For Hadoop with Pig. Java EE Batch is relatively new specification but for a number of use cases it will be sufficient. This sentence says it all – I thing that Java EE Batch is ok. Nothing more yet. As you will see you have to do a lot things yourself and doing other things is just cumbersome.

So what is is Java EE Batch, a.k.a. JSR 352 (Batch Processing for Java Platform) – it is heavily inspired by Spring Batch, Java EE family specification describing framework for batch job processing. It allows to describe jobs in Job Specification Language (JSL) – only XML format is supported. I guess that for batch apps this may be ok. I can live with it ūüėČ

Java EE Batch specified two types of processing:

  • Chunked – here we have 3 phases – reading, processing and writing. Chunked steps can be restarted from some Checkpoint.
  • Batchlet – this is a free form step – do whatever is required.

Java EE Batch has many features like:

  • Routing processing using decision steps
  • Grouping groups of steps into flows
  • Partitioned steps –¬†step instances to be executed in parallel. Partitioned step allows to split job into multiple threads in more technical way. We can specify partition mapper, partition reducer and other partition specific elements. I won’t go into detail on this now.
  • Splits –¬†splits specify flows to be executed in parallel – there can be a few flow definitions. so this is more process or business oriented parallelization¬†of work.
  • Exception handling:
    • Skipping steps
    • Retying steps
  • JobContext to pass batch job data to steps

CDI is used and supported. But with all of these specifying job context parameters is inconvenient (in 2005 I would say that it’s ok ;)), we must do a lot our selves – like moving files, reading them and passing intermediate results.¬†It would be nice to have some utilities to help with standard batch related chores.

Ok let’s see some example – below you can fine a JSL definition for a demo batch job. You can build this XML using Eclipse xml editor with ease (or use some plugins for NetBeans) – just choose XML from schema option and then select XML Catalog and jobXML_1_0.xsd entry:


First we move file from import location to some temporary location so it can be processed without any problem:

next=”resource-router” specified next step to be invoked – there is a possibility for a typo here. So inconvenience.

<jsl:batchlet ref=”initializeResourceImport” /> – specifies a batchlet (free form job) step that will move file. Instance of class for this step will be instantiated as a CDI bean named “initializeResourceImport”:

Here we move file to a workDir location.

Next we start a flow and execute another step type – chunked step. Next step id is given in next attribute.

Each of phases is implemented by a CDI Bean. So reading looks like this:

Open and close stream in this step, w provide checkpoint definition in order to restart from some point. Single invocation of this reader is only reading single line that is passed down to process part of step. It would be easier if Batch framework provided some reading utilities and let us worry about doing important stuff with data than to worry about opening and proper closing of files. But this foundation functionality that is provided now is also required and features that make using Batch Processing for Java Platform will be added in next versions of spec.

Next phase is processing line of data read from file Рonly thing that is done here is conversion from String to KnowledgeResource instance

And finally we can write the data

This is also simple processing in order to have some fun with Java EE Batch framwork without worrying about more job related details. Important parts here is that this phases receives a list of objects that are results of processing phase.

Use of application scoped CDI bean of class¬†KnowledgeResources – this is done this way in order to ease testing. Of course in real life batch job I would not think about using application scoped bean without any special handling – it’s like having state field in a Servlet where requests store data ūüėČ Very bad idea.¬†This job passes a result of processing that will be used by Decider instance to route processing. This simplish demo uses a hard coded value.

After processing data in resource-processor step we proceed to decision step Рpath-decision. This step is also implemented by CDI Bean named pathDecider.

Depending on what value is returned from Decider instance some route will be chosen. Steps that are to be invoked next are designated by id in to attribute. Java Based config would allow here to get rid of some typos.

Again implementation is simplish:

Decider chooses on of two steps (ok, it does choose STORE step every time…):

CDI beans for both steps are similar:

Important note about running test Рwe have to use Statless EJB to start batch because of class loader ordering РArquillian does not see Java EE Batch resources:

So we use BatchImportService to start batch. In real app we would probably use a Claim check like processing in order to pass data to an SFTP server and trigger processing using JMS message or Web Service call. We could also monitor SFTP using some service (Oracle Service Bus for example can monitor SFTP and start processing).

And this is it Рa simple evaluation of Java EE Batch (or Batch Processing for Java Platform) basic features. Next thing would be to add partitioned or split features. In the mean time Рcode is on github.

Apart from batch applications we could use stream or event processing. Chosen architecture depends on requirements for system under construction. If order is important but processing delay is not so important than batch processing is worth to check. If we need to react more quickly, or we want to continue processing data despite processing of some part of this data failed and we want to have high reliability than using messaging may be way to go. But all this is a subject for another article ūüôā

Have a nice day and a lot of fun with Java EE !

JPA 2.1 – Converters

20th February 2016 Java EE, JPA, Software development No comments

Converting values between object data model and relation data model in JPA 2 is super easy. Here are the steps:

1. Declare converter for value pair thet you want to be converted. Let’s say that we want to convert boolean values to integer, then we declare converter like this:

Converter has to implement javax.persistence.AttributeConverter<ObjectModelType, RelationalModelType>. This interface declared to methods with very clear names – at first sight one can tell what they are responsible for.

Instead of using annotation you can use XML descriptor of course.

2.a Declare Converter class to be applied automatically – just add autoApply attribute set to true to Converter annotation:

2. b If Converter is not applied automatically then annotate entity, mapped superclass or embeddable class attribute with javax.persistence. Convert annotation (or add appropriate XML element to XML descriptor).

converter attribute specifies Converter class to be used for conversion.

If you annotate attribute that is not a basic type or element collection of basic types than you must also add attributeName attribute to annotation to tell JPA to which attribute you want the converter to be applied to.

Convert annotation can also exclude or or overwrite default conversion, e.g. if we have defined some other converter like:

than this converter would not be applied to aforementioned locked field because other converter is specified directly on field.

In order to exclude this attribute from conversion Converter annotation must have disableConversion attribute specified with value true.

Some hints:

According to JPA 2.1 specification point 11.1.10 Convert Annotation some attributes should not be applied to some fields like id, Enumerated and Temporal (annotated) types, relationship attributes or version attributes.

Convert annotation can be applied to entity class that extends mapped superclass to overwrite conversion mapping, see point 11.1.10 Example 11: Override conversion mappings for attributes inherited from a mapped superclass.

When using converter in Java SE environment one needs to specify converter class in persistence.xml, same as with entity classes.

That’s it ! Have a nice day ūüôā

Setting up Arquillian for Java EE testing

30th January 2016 Java EE, Testing No comments

This post is about aliens. There will be new X-Files series. I watched some episodes, I’m not a big fan, but this show had its good sides. In Java aliens are present from some time, I don’t even remember when they landed. My first 3-rd degree encounter with them was around 2012. The alien that I saw looked a little bit like this


Arquillian (difficult name Рbut hey ! his an alien) is a platform for running Java EE components tests inside Java EE, CDI and Servlet containers. There are modules available for functional testing (Graphene / Drone), persistance layer testing and other. It can manage embedded containers, manage containers by itself, it can attach to external container started in some other way. Arquillian creates deployment archives for this containers and provides us with test runners. Available containers are listed in the modules section of Arquillian home page and in the reference guide. Modules section is more up to date, but it does not provide complete guide on how to setup test environment.

Arquillian lets you test different component types depending on container used, but most of the time you will probably test with full Java EE Container or Servlet Container. Arquillian can even download server by maven resources plugin, it will be useful for the demo now. I think I may reuse this setup for next posts about Java EE, but then I will use external Wildfly or some other Java EE server. I would like to build a knowledge management application in Java EE using Domain Driven Design.

Why would I need module testing ? I prefer to test integrated modules as a lot of bugs is related to errors in cooperation between modules. With rigorous unit testing (that is test that test single unit of code) and design it may be possible to provide comparable quality, after all every component has provides API that can be tested. But then again if we plan and provide test¬†in unit test that covers case component will throw InvalidArgumentException will not prepare us for actually handling this exception¬†in client¬†components.. Yes, it¬†can be handled¬†in client component (or some higher level component up the call stack), but you also must detect cases where you forgot to take care of it – and this is where rigorous unit testing comes in to consideration. In real life projects it’s sometimes not so easy.¬†After all software development is about providing software that provides business value and required quality and architecture metrics. Having this in mind I prefer to create automated functional (e2e) test (always for not trivial projects) and then test that are more module or unit tests depending on component. Testing integrated code often requires container – like Spring or Java EE container. This is where Arquillian comes in.

Getting back to our alien – setting up Arquillian is not easy. My setup is not the most up to date one. I tried to update it, spend some time fighting with Wildfly / Aruqillian configuration and decided that it is not the most important thing to use Arquillian Universe instead of older setup. After all we will be using Wildfly 8.1 and Java EE 7 – good enough ūüėČ

First thing is to set up maven configuration:

1. Set up integration test plugin

Important detail here is to setup system properties variables as these will be used by Arquillian and passed to Wildfly. We will us standalone-full.xml so that we can test API’s like JMS that aren’t available in standalone config.

2. Download and configure Wildfly

3. Dependencies

Here we have setup in parent pom – notice

  • jboss-javaee-all-7.0 dependency – we do not use the javaee-api dendency as it contains only API withou implementation (look at warning here)
  • arquillian-bom – this pom contains all dependencies required for Arquillian, we import is so that we do not have to specify full¬†dependency configuration ourselves

and core module:

Now we can setup Wildfly in standalone-full.xml. Important part is overwriting port values here Рoriginally http management port conflicts with NVidia application on Windows. We may also overwrite http port (but we are not really required to do so)

Finally setup in  arquillian.xml, a configuration file used by Arquillian. We just need to tell Arquillian what is the number of management port (setup in standalone-full.xml)

In order to create test we need to specify deployment that Arquillian will work with. We do it using ShrinkWrap module:

Now we are ready to do Java EE module testing:

Example code at github.

Have a nice day ūüėČ