Monday 17 July 2017

Use Regular Expression to parse the image reference in the markdown sourcre code

As mentioned in my blog A Github repository issue backup tool developed by ABAP I am developing a tool in ABAP which can backup my github issues.

One of the functionality I would like to fulfill is to backup the images used in issue with markdown format as well. Take this issue of mine for example. This issue contains the reference to image files uploaded by me via dragging my local image files and drop them to the browser.



The syntax is ![local file name, could be initial](image url), here below is an example:
![clipboard7](https://user-images.githubusercontent.com/5669954/28164018-f15fee1a-67ff-11e7-9ef0-ea44b3e3a049.png)

Since I have already downloaded the source code of all github issues belonging to a given github repository into ABAP system, the next task for me is to develop a function module which accepts github issue source code with markdown format as input, and generate an internal table as output which looks like below:


Use JavaScript regular expression to parse image reference in markdown source code


I have written the following code to parse the image reference in one of my github issue:

<html>
<script>
const regex = /!\[(.*?)\]\((.*?)\)/g;
const str = `# Created by Jerry Wang, last modified on Jun 24, 2014

User parameter里只要BSPWD_USER_LEVEL 值大于5 即可在webclient UI上显示error message的technical information:
![clipboard1](https://user-images.githubusercontent.com/5669954/28164017-f102028c-67ff-11e7-81f0-a4e8b0e48fd0.png)
![clipboard2](https://user-images.githubusercontent.com/5669954/28164011-f0781176-67ff-11e7-8779-6db53bb01be6.png)
![clipboard3](https://user-images.githubusercontent.com/5669954/28164013-f094ce10-67ff-11e7-9b47-d26fa4413599.png)
![clipboard4](https://user-images.githubusercontent.com/5669954/28164014-f0998676-67ff-11e7-9126-8c0d609b0607.png)

这里决定用什么icon来在UI上显示message:
![clipboard5](https://user-images.githubusercontent.com/5669954/28164019-f16258f8-67ff-11e7-9aec-43538d1ce54f.png)
message level > 5的判断:
![clipboard6](https://user-images.githubusercontent.com/5669954/28164015-f09c7570-67ff-11e7-9c06-da380d4a0c06.png)
![clipboard7](https://user-images.githubusercontent.com/5669954/28164018-f15fee1a-67ff-11e7-9ef0-ea44b3e3a049.png)
![clipboard8](https://user-images.githubusercontent.com/5669954/28164016-f0d067e0-67ff-11e7-9445-baf3af4c1494.png)`;
let m;
var printResult = ( array) => {
    var url = array[2];
    var splited = url.split(".");
console.log("local file: " + array[1] + "." + splited[splited.length-1]);
console.log("url: " + url);
};
while ((m = regex.exec(str)) !== null) {
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    printResult(m);
}
</script>
</html>

The output shows this regular expression works as expected:



Solution in ABAP


When I copy the working regular expression !\[(.*?)\]\((.*?)\) in JavaScript to ABAP, it does not work – this regular expression is complained as invalid.


From this stackoverflow discussion thread Greedy/non-greedy quantifiers in ABAP regular expressions I get to know that the symbol ? for non-greedy match behavior in regular expression is not supported in ABAP.

Thus I have to use some alternative to achieve the same output as in JavaScript.

Approach1 – use dynamically generated regular expression


My idea is to first identify all occurrence of “![xxx](yyyy)” by dynamically generation of corresponding regular expression pattern.

I have written another tool class to achieve this requirement:

class CL_ABAP_GIT_ISSUE_IMAGE_TOOL definition
  public
  final
  create public .
public section.
  types:
    BEGIN OF ty_image_reference,
           image_name TYPE string,
           image_url TYPE string,
         END OF ty_image_reference .
  types:
    tt_image_reference TYPE TABLE OF ty_image_reference with key image_name .

  class-methods GET_IMAGE_REFERENCE
    importing
      !IV_ISSUE_SOURCE_CODE type STRING
    returning
      value(RT_IMAGE) type TT_IMAGE_REFERENCE .
protected section.
private section.
  class-data SV_IMAGE_PATTERN type STRING value '(!\[.*\]\(.*\))' ##NO_TEXT.
ENDCLASS.
CLASS CL_ABAP_GIT_ISSUE_IMAGE_TOOL IMPLEMENTATION.
  METHOD get_image_reference.
    DATA: lv_reg_pattern TYPE string,
          lt_result_tab  TYPE match_result_tab.
    FIND ALL OCCURRENCES OF '![' IN iv_issue_source_code MATCH COUNT DATA(lv_count).
    CHECK lv_count > 0.
    lv_reg_pattern = sv_image_pattern.
    IF lv_count > 1.
      DO lv_count - 1 TIMES.
        lv_reg_pattern = lv_reg_pattern && '.*' && sv_image_pattern.
      ENDDO.
    ENDIF.
    TRY.
        FIND ALL OCCURRENCES OF REGEX lv_reg_pattern
             IN iv_issue_source_code
             RESULTS lt_result_tab.
      CATCH cx_root INTO DATA(cx_root).
        WRITE:/ cx_root->get_text( ).
        RETURN.
    ENDTRY.
    READ TABLE lt_result_tab ASSIGNING FIELD-SYMBOL(<result>) INDEX 1.
    CHECK sy-subrc = 0.
    LOOP AT <result>-submatches ASSIGNING FIELD-SYMBOL(<match>).
      WRITE:/ 'Match...........'.
      WRITE:/ iv_issue_source_code+<match>-offset(<match>-length).
    ENDLOOP.
  ENDMETHOD.
ENDCLASS.

Test via the following source code and this approach can work as expected:
SELECT SINGLE * INTO @DATA(ls) FROM crmd_git_issue WHERE repo_name = 'KM' AND
   issue_num = 282.
cl_abap_git_issue_image_tool=>get_image_reference( ls-issue_body ).


However if there are too many image references used in a markdown document, the dynamic regular expression generation will end up with exception CX_SY_REGEX_TOO_COMPLEX. As a result I have to seek other alternative solution.


Approach 2 – use ABAP to consume regular expression parse service implemented in JavaScript


As illustrated in JavaScript part that we have already elegant solution to parse local file name and file url via JavaScript, why not directly consume the result done by JavaScript in our ABAP code?

step 1 – upload the JavaScript solution to SCP

As soon as the JavaScript solution is uploaded to SCP ( SAP cloud platform ), it could act as a RESTful Service endpoint which could be consumed by any other programming language.
Please follow this blog of mine Deploy your web application to SAP Cloud Platform which can access resource from On-Premise ABAP system to upload this nodeJs application to SAP cloud platform. Once deployed, access url https://jerrylist.cfapps.eu10.hana.ondemand.com in browser, and it is expected to see “hello World” response:



step 2 – consume the JavaScript service in ABAP

I have encapsulated the necessary logic of sending http request to SAP cloud platform in utility class cl_abap_git_issue_image_tool ( source code could be found from my github)
Test code:

SELECT SINGLE * INTO @DATA(ls) FROM crmd_git_issue WHERE repo_name = 'KM' AND issue_num = 282.  DATA(lt_image) = cl_abap_git_issue_image_tool=>get_image_ref_via_js_service( ls-issue_body ).

Totally 8 images referenced in this issue are parsed as expected:

No comments:

Post a Comment