Spring Batch 2.0 – Part I – Simple Tasklet

There is always a healthy debate when talking Java and batches. When I heard Spring Batch, I had to try it out. On a previous project, many eons back, I did some batch processing in Java. What hurt me there (after a lots of optimizations) was a call to another persons module. His module happily loaded up an entity bean. You can guess where that ended. Next release I went through the code and replaced the entity bean calls with ONE update SQL statement. That fixed things. I was processing 200k records in 15-20 minutes, with an extremely small memory footprint. Even this I could reduce further if I tuned another module. But the performance was deemed enough and we moved on.

What I personally felt from that experience was the need of a decent Java-based Batch processing framework. Of course having this does not mean use Java for batches. Sometimes for bulk processing doing it in the database may be the right approach.

In this blog I want to go over Spring Batch processing. We will start off with some definitions.

Job – A job represents your entire batch work. Each night you need to collect all of the 1)credit card transactions, 2)collect them in a file and then 3)send them over to the settlement provider. Here I defined three logical steps. In Spring Batch a job is made of up of Steps. Each Step being a unit of work.

Step – A job is made up of one or more steps.

JobInstance – A running instance of the job that you have defined. Think of the Job as a class and the job instance as your , well object. Our credit card processing job runs 7 days a week at 11pm. Each executions is a JobInstance.

JobParameters – Parameters that go into a JobInstance.

JobExecution – Every attempt to run a JobInstance results in a JobExecution. For some reasons Jan 1st, 2008 CC Settlement job failed. It is re-run and now it succeeds. So we have one JobInstance but two executions (thus two JobExecutions). There also exists the concept of StepExecution. This represents an attempt to run a Step in a Job.

JobRepository – This is the persistent store for all of our job definitions. In this example I setup the repository to use an in-memory persistent store. You can back it up with a database if you want.

JobLauncher – As the name suggests, this object lets you launch a job.

TaskLet – Situations where you do not have input and output processing (using readers and writers).  We use a tasklet in this blog.

The next three definitions do not apply to this blog since I will not be using them. Part II of this blog will show an example on these.

ItemReader – Abstraction used to represent an object that allows you to read in one object of interest that you want to process. In my credit card example it could be one card transaction retrieved from the database.

ItemWriter – Abstraction used to write out the final results of a batch. In the credit card example it could be a provider specific representation of the transaction which needs to be in a file. Maybe in XML or comma separated flat file.

ItemProcessor – Very important. Here you can initiate business logic on a just read item. Perform computations on the object and maybe calculate more fields before passing on to the writer to write out to the output file.

In this blog lets go through the age-old hell world example. Our job will run a task which prints out hello world. Not much happening here but will show all of the important concepts in work before Part-II where I use the reader and writer to read from a flat file and insert 200k records into the database (in about 1 minute). Wanted to throw that out to the naysayers who just hate doing batches in Java.

  1. Use annotations to identify and autowire my spring beans.
  2. Ignore the data source configuration. It is not used for this example. Because I have a DAO I use for Part II & III this is here.
  3. Configure the job repository. Using an in-memory persistent store for this example.
  4. The job launcher.
  5. Register the two beans that make up my 2 steps for the job. One prints hello world and the other the time of the day.
  6. Last but not the least is the Job definition itself. Note the batch:listener which registers a listener to track job execution.

Now here is the code for the 2 steps in HelloTask.java

And here is the 2nd tasklet – TimeTask.java

Last but not the least is my test driver that launches the batch itself. I use a Spring enabled JUnit test case to implement my driver.

Spring Batch – Part II – Flat File To Database – Read from a comma separated file and insert 200k rows into a HSQLDB database.

Spring Batch – Part III – From Database to Flat File – Read back the 200K rows and now write it out to a new file. Later.

You can download the Maven project from GitHub – https://github.com/thomasma/springbatch3part.