Friday 27 August 2021

ASYNC/Parallel ABAP in a OO way

Motivation

Every now and then we find a need for doing some work in parallel. Recently I’ve worked on a process that was taking far too long to be of any use. It was taking large amounts of data in and creating an even larger amount of records given complex rules, validations and enrichment.

Refreshing some ABAP parallelism techniques, my options were basically to split the input data and then running FMs in background, starting a new task or writing a report and scheduling a job with parallelism being ranges of a selection criteria.

All valid options, I suppose, but I got used to work purely in OO and hammering FMs here and there to have my parallelism had a particular bad smell.  It would prevent me to assert some critical behavior via unit tests, for starters, and it would also add this new responsibility, running in parallel, to weird places with code that looked boring and repetitive (take data from classes, run fm, take data back…).

So I decided to find an OO approach to parallelism in ABAP, and took Java Thread as inspiration. I was particularly drawn to Java Thread  given its simple Runnable interface, which allows developers to quickly wrap what needs to be done in parallel in a single method. I could use it to simply call the existing methods I already had. Below is an example on how it would look like in Java.

  public class MyRunnable implements Runnable {

    public void run(){

       myOtherClassMethodThatDoesStuff( )

    }

  }

Runnable runnable = new MyRunnable(); // or an anonymous class, or lambda...

Thread thread = new Thread(runnable);

thread.start();

zthread

There are probably hundreds of similar implementations to this out there. Hope this one can be useful for some of you. My concept is exactly the same as in Java, having the Runnable interface and the Thread as the starting point.  As I’m used to (and like) the callback idea behind Javascript, I’ve thrown it into to mix. It made some of the code I needed easier to write and test. It looks something like this:

SAP ABAP Exam Prep, SAP ABAP Tutorial and Material, SAP ABAP Preparation, SAP ABAP Career, SAP ABAP Certification

Created with PlantUML(http://plantuml.com)

There is a thread factory interface and some default implementations too, which I’ve discarded from the diagram, to help unit test classes using Threads.

Examples


You can check git for the most up to date documentation. I´m putting some examples here to show how to use it in general.

Implement runnable,

To defined the work you want to start in parallel, create a class that implements the runnable interface.

class zcl_my_runnable definition.
    public section.
        interfaces zif_runnable.           
        methods:
            constructor
                importing
                    data_for_async_processing type any table.
endclass.
 
class zcl_my_runnable implementation.
     
    method zif_runnable~run.
        loop at my_data_for_async_processing into data(data).
        "...
        endloop.
    endmethod.

endclass.

Thread creation, start, join

Simply give your runnable to a Thread and start it. The thread static method join_all awaits for all async processing to finish.

data(dataSplit1) = large_data. "from 1 to 100k for example
data(dataSplit2) = large_data. "from 100k to 200k for example

data(runnable1) = new zcl_my_runnable( dataSplit1 ).
data(runnable2) = new zcl_my_runnable( dataSplit1 ).

new zcl_thread( runnable1 )->start( ).
new zcl_thread( runnable2 )->start( ).

zcl_thread=>join_all( ). "awaits all threads to finish

Thread results and callbacks

You can get the result of your thread and also use the callback mechanism to collect all results once in a single, more well defined place.

Defining the result.

First, you need to define your result by implementing serializable interface zif_runnable_result. Here we are making the runnable its own result and giving it sets and gets to access it.

class zcl_my_runnable definition.
    public section.
        "implementing both interfaces
        interfaces zif_runnable,zif_runnable_result.       
     
        methods:
            "result type here could be structures, tables, fields, etc            
            set_my_result
                importing iv_result type some_structure,
            get_result returning value(rv_result) type some_structure.

   private section.
      data: my_result_value type some_structure.
 
endclass.
 
class zcl_my_runnable implementation.
    
    method set_my_result.
        my_result_value = iv_my_result.
    endmethod.
 
    method get_result.
        rv_result = my_result_value.
    endmethod.
 
    method zif_runnable~run.
        "async processing populating some result
        set_my_result( my_result ).
        "as My runnable implements the runnable result this is possible,
        "but this can be as well some new zcl_my_runnable_result( my_result ).
        ro_result = me.
 
    endmethod.

endclass.

Implementing the callback

Implement the zif_thread_callback interface and give it to the thread together with your runnable.The callback object must still be valid once the thread finishes. You can check this via the example below. Method split_into_2 contains the main program logic.

class zcl_my_program definition.
 
 
    public section.
        "my program is implementing the thread call back,
        "but this can be a separate obejct.
        interfaces zif_thread_callback.
        methods:
           split_into_2
               importing 
                   large_data type any table.
    private section.
       data: my_total type some_structure.
   
endclass.
 
class zcl_my_program implementation.
 
    method zif_thread_callback~on_result.
 
        data(lo_my_runnable_result) = cast zcl_my_runnable( io_result ).
        my_total = my_total + lo_my_runnable_result->get_result( ).
 
    endmethod.

   method zif_thread_callback~on_error.
        "this method is triggered in case any error happen during runnable execution.
        "you can raise your own exceptions there as well
        "taskname is available in both on_callback and on_error.
        "taskname is an optional value of thread constructor
        raise exception type zcx_my_calculation_error
            exporting
                previous = io_error
                taskname = iv_taskname.
         
    endmethod.

   method split_into_2.
     data(split1) = large_data. "from 1 to 100k   
     data(split2) = large_data. "from 100k +100k

     data(runnable1) = new zcl_my_runnable( split1 ).
     data(runnable2) = new zcl_my_runnable( split2 ).

     "as my class implements the callback interface, I'm sending myself. 
     "but this could be another object as well
     new zcl_thread( io_runnable = runnable1, io_callback = me )->start( ).
     new zcl_thread( io_runnable = runnable2, io_callback = me )->start( ).
  
     zcl_thread=>join_all( ). "waits for both threads to finish
     
     write: my_total.
   endmethod.

endclass.

Some thoughts


Feel free to check the code behind it. What is happening is the serialization of the runnable so it can be sent to a function module as a string. This FM is called using options starting new task and on end of task. In the FM, the runnable is deserialized and its run method is called. Whatever result or error that happen are serialized by the FM and deserialized back by the Thread class. If a callback routine is defined, it is called with the result or error. This means a couple of things.

◉ Avoid complex runnables, all serialization constrains of the id transformation are applicable. If you have complex runnables, relying on other objects, prefer instantiating those objects inside the run method.

◉ Parallel work runs on dialog workers. As far as I can tell, the only restrictions are memory allocation and time execution (5 min being default). Make sure you split into chuncks that fit dialog workers restrictions.

◉ Each Thread runs on a different session, on top of serialized/deserialized objects, so…Don’t fall into the trap of expecting a singleton to be shared between threads.

◉ If you have a server with 2 processors, it doesn’t matter if you have 20 threads as they will just round robin your CPU time. Most production servers have a good enough number of threads available, but it is not always the case for development servers. Make sure you account for this when developing, testing, deploying.

◉ Threads can´t be stopped. Unfortunately I did not found a way to get the PID of a thread based on this task name.

◉ If a callback routine is defined, it must still be a valid reference at the end of the Thread.

I really wished we could have an official implementation by SAP, one that could allow us the use of a ThreadPoolExecutor, so no matter how many threads I need to start, only a x number are running at the same time at that pool. And something that would allow us to implement the philosopher’s dinner without having to implement memory area classes.

But, as I say this I also realize we are moving away from heavy process and relying on more stream like flows with small API endpoints. In any case, it still a tool that can be used when needed.

No comments:

Post a Comment